Imagining AccessibilityGPT

In my years of accessibility research and development work, one tool has dominated the landscape. A technique so powerful, it transcends technology, application, or industry.

"Hey Dave, can you come here and help me with this?"

Every application I've worked on, no matter how WCAG compliant, has some accessibility hole in it where the only solution for a differently-abled person is to reach out and have someone help them complete a task, summarize a screen, or find the way out.

In a world gone mad over the possibilities and threats of LLM and AI-adjacent technologies, I've slowly become excited about what a text-based, connected system could mean for my friends who need that extra boost getting around on a klugey web.

I know OpenAI has Be My Eyes as a tool for navigating meatspace, but imagine an extension to NVDA that can interpret a browser screen, move the cursor, or perform a series of actions like AutoGPT. LLMs have already demonstrated an ability to navigate, reformat, and modify structured datasets. This makes them uniquely capable of responding to natural language requests and handling the web.

1

Trying to fill out a government ID form, a person is faced with a really long form with over a hundred inputs. They finish, tab to the submit button, and nothing happens.

Hey AX-GPT, place my cursor on the first input that's in an error state.

The model uses a combination of visual matching and DOM exploration to find a bunch of fields with red text and an error class nearby in the DOM and places the cursor on the first, notifying the user there are three more.

2

A person is scrolling through a social media site and gets a message from a friend containing only an image and a laughing emoji.

Hey AX-GPT, describe the image on my cursor.

AX-GPT assumes that it's a joke based on the proximity of the laughing emoji and matches the image meme against databases like Know Your Meme, which allows it to simulate telling the joke in the friend's writing tone.

3

A person is looking at their bank statement, and the screen reader tells them there's an image describing their account balance.

Hey AX-GPT, can you describe the trend line of the account balance? Explain it as a list of current balances by month and add the positive or negative difference. End with the difference between the first balance and the last balance.

AX-GPT responds with an ordered table in an extension modal. The user can listen to the whole table, explore the data, or dismiss the modal.

Etc.

Some more for fun…

Hey AX-GPT, can you add up all the totals on the bank statement?

Hey AX-GPT, did I get a refund from Amazon this week?

Hey AX-GPT, can you find the tracking number and copy it to my clipboard?

Hey AX-GPT, click on the link to Customer Support live chat.

When?

At the pace things are going, I didn't write this for weeks because I assumed it would already exist by the time I posted it. 🤷

I think that a simple assistant like this will be a boon. It won't negate the importance of writing semantic and accessible HTML... it'll make it even more important. Consistent and semantic HTML is easier to process and crawl for an LLM-based assistant like the one we've imagined.

(And if you are working on this, let's talk!)


🔖
Changelog
  • 2023-05-09 11:41:58 -0500
    Chat-GPT proofread and add example

  • 2023-05-09 11:38:57 -0500
    First draft