Several of the projects that I saw showed glimpses of a possible future for screen readers.
I’ve written about screen readers before, and some of the challenges with using them.
One project interpreted pictures of charts or graphs and created a textual summary of the information shown in them.
I’m still amazed at this. It takes a picture of a graph, not the original raw data, and generates sensible summaries of what it shows.
For example, given this image:
It can generate:
This graphic is about United States. The graphic shows that United States at 35 thousand dollars is the third highest with respect to the dollar value of gross domestic product per capita 2001 among the countries listed. Luxembourg at 44.2 thousand dollars is the highest
The dollar value of gross domestic product per capita 2001 is 25 thousand dollars for Britain, which has the lowest dollar value of product per capita 2001. United States has 1.4 times more product per capita 2001 than Britain. The difference between the dollar value of gross domestic product per capita 2001 for United States and that for Britain is 10 thousand dollars.
Their focus is on the sort of graphics found in newspapers and magazines – informational, rather than scientific graphs. They want to be able to generate a high level summary, rather than a list of plot points that require the user to build a mental model in order to interpret.
The image shows a line graph. The line graph presents the number of Walmmartâ€™s sales of leather jackets. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1997 to 1999 followed by a falling trend through 2006. The first segment is the rising trend. The rising trend is steep. The rising trend has a starting value of 1890. The rising trend has an ending value of 36840. The second segment is the falling trend. The falling trend has a starting value of 36840. The falling trend has an ending value of 12606.
The image shows a line graph. The line graph presents the number of people who started smoking under the age of 18 in the US. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1962 to 1966 followed by a falling trend through 1980. The first segment is the rising trend. The rising trend is steep. The second segment is the falling trend.
It’s able to interpret an image and recognise trends, recognise how noisy or smooth it is, recognise if the trend changes, and more. Impressive.
Interpreting data in tables
Another project demonstrated restructuring data tables in web pages to make them easier to explore with a screenreader.
They have an interesting approach of analysing an HTML table and reorganising it to make it more accessible, abstracting out complex sections into a series of menus.
For example, given a table such as this:
it can produce navigable menus such as this:
Even quite complex tables, with row and column spans, which would otherwise be quite difficult to interpret if read row-by-row by a screenreader, is made much more accessible.
Capti web player
Another technology I saw demonstrated was the Capti web player.
This capability should be ideal for visually impaired users, but the tools themselves are still quite difficult to use and integrate poorly with assistive technologies. Someone described them as obviously “designed by sighted people for sighted people”.
Capti combines this capability with an accessible media player making it easy to navigate through an article, move through a list of articles, and so on. To a sighted user like me, it looked like they’ve mashed together instapaper with an audiobook-type media player. I often listen to podcasts while I go running, and am a heavy user of pocket and Safari’s reading list. So this looks ideal for me.
Multiple simultaneous audio streams
Finally, one fascinating project looked at how to make it quicker to scan large amounts of content with a screenreader to find a specific piece of information. I’ve written before that relying on a screenreader (which creates a sequential audio representation of the information on the page, starting at the beginning and going through the contents) can be tremendously time-consuming, and that it results in visually impaired users taking considerably more time to find information on the web.
This project investigated whether this could be improved by using multiple simultaneous sound sources.
It sounds mad, but they’re starting from observations such as the cocktail party effect – that in a noisy room with several conversations going on, we’re able to pick out a specific conversation that we want to listen for. Or that a student not paying attention in a lecture will hear if a lecturer says something like “this will be on the exam”.
They’re looking at a variety of approaches, such as separating the channels directionally, so one audio stream will sound like it’s coming from the left, while another is in front. Or having different voices, such as different genders, for the different streams. It’s an intriguing idea, and I’d love to see if it could be useful.