Accessibility: How to live stream subtitles, and MORE…

We hosted the No Boundaries conference again this year – – and as before, we provided lots of accessibility options. The one that seemed to excite people the most was the live subtitling (properly called STT, Speech To Text) being embedded on a live stream.

The first half of this post is a technical discussion of how to stream live subtitling. The second half is a more general discussion of accessibility (including where we messed up) because impressive though they are, subtitles are a tiny fraction of what’s required!

Last year we streamed live subtitles in a complicated way that required Wowza Media Server and hand-crafting XML. This year we did it more easily, in a way that allows you to stream to whatever service you want (Livestream, Ustream, Dacast, Youtube, etc, etc).

Screen Shot 2015-10-02 at 17.59.45

Why do live subtitling at all?

Well, people who are deaf or hard of hearing shouldn’t be excluded from your event. Pretty simple, really.

But it’s not just for them. Live captions benefit:

  • viewers whose first language isn’t English (or whatever’s being spoken at your event): it’s much easier to follow a foreign tongue when you can read as well as hear
  • people who are watching in an environment – eg on public transport, or when getting a baby to sleep, or in bed, or in a huge number of other situations – where it’s more convenient to watch without the sound on.

There are more people in that second category than you might think – we got many tweets from people grateful that the captions were there.

As an added bonus, this is an easy way of getting a transcript. After the event you can pay a very small fee (see bottom of this post for costs) to get the live subtitling neatened up and corrected, and a transcript produced. Transcripts are very useful for SEO, for searching within your own archives, and of course for people who would rather read a talk than watch it – much lower bandwidth, for a start!

Step 1. Getting the live subtitling in the first place

First thing first: you will need to phone up an STT (speech to text) provider. We’ve used for several events, and they’re always brilliant. Through your provider you book some STTRs (Speech To Text Reporters) – you’ll likely need a pair so they can take it in turns throughout the event. The fee you pay includes some support, and also use of the confusingly-similarly-named which is how you get hold of the captions the STTRs type. See the bottom of this blog post for approximate costs.

The STTRs work remotely: you make a group Skype audio call to them and feed the audio from your session to that call. That’s how they hear what’s going on. You need to send over details of what’s going to be said beforehand so they can programme in names, technical terms, etc in advance. You also need to be Instant Messaging them on Skype during the event to let them know what’s happening (eg cue them when a session’s about to start). The better you treat your STTRs the better the end result. They have incredible skills, but can’t mindread.

So, this gets you live subtitling in the room itself: you can set up a large monitor with a computer attached that runs a full screen browser showing the Streamtext site for your event – here’s a demo of what the subtitling looks like, here’s some HTML with our preferred settings for a 720p screen, and here’s a video of our subtitle screen in action. This setup also gives you live subtitles for your stream, because you can embed a small iframe below your streaming video player.

Step 2. Getting the live subtitles into the video stream

So having got live subtitles in the room and online, why do anything else? Three reasons:

  1. You can’t record the streamtext iframe, so if you have a DVR-type stream where users can pause/rewind, it’s useless. And it’s so easy to do DVR streams these days, why wouldn’t you?
  2. Although RTMP streams are pretty much realtime, they’re going out of fashion due to lack of support (eg none at all on mobile). Generally you’d use an HLS stream these days, but they can be up to 60s behind realtime. So your iframe-embedded subtitles will appear well before the associated speech in the video.
  3. Many mobiles only show video fullscreen, so will be unable to show the iframe embed at the same time (remember the category 2 people from earlier)

This means we need to employ some trickery. I like trickery.

Hopefully you can already guess that what we need to do is somehow turn the streamtext web page into some sort of video that can be overlaid on top of your camera feed. The first stage of this has to be to create a web page which has subtitles at the bottom, and green screen above it (for the chroma key). Roughly like this:


This isn’t all that hard to do: here is the live HTML. Note that in all this, I’m assuming a 720p stream, so the HTML needs a browser window size of 1280×720 to look correct (it’s also optimised for webkit browsers like Chrome or Safari – the iframe size may need adjusting to hide the scroll bar in other browsers).

You now have two options: the hardware way, or the software way. The hardware way probably involves setting up a Raspberry Pi to show this HTML fullscreen, taking its HDMI output into your fancy vision mixer, and overlaying it there. However, we don’t have a fancy vision mixer that can do that, so we did option 2: the software way.

Step 2b. The software way

I like the software way better, partly because I’m better at software than hardware, but also because it’s easier to send multiple streams out (one plain, with with subtitles, plus any others you need like BSL – see below) with just one camera feed and one computer.

I should say at this point that I’m using a Mac. I’m sure it’s possible to do this on Windows using the same components, but you’ll have to do some steps slightly differently.

To render our special green-screen web page, I’m going to use PhantomJS. PhantomJS is just a normal web browser, except it runs invisibly and you can’t see its output directly: you can, however, script it and tell it to do various interesting things.

What we do is to tell PhantomJS to render our subtitle webpage, and then take a JPEG-format screenshot of it several times a second. We then use a tiny bit of javascript code in PhantomJS to convert those screenshots into a Motion JPEG stream: this is pretty much the most basic video stream you can get, in that it’s just a series of JPEG images shunted out very fast one after another.

Why would we do this? Because the excellent streaming software Wirecast Pro (yes, only the $995 Pro version) will take a Motion JPEG stream as one of its inputs. This means we can get Wirecast Pro to overlay our special green-screened subtitle page on our normal camera feed. More to the point, on a decent computer, Wirecast can put out several streams at once, so we can have two streams: one plain, one with the overlay, both from the same camera feed.

The code to do this is over at github. It’s pretty simple: just download PhantomJS for your system, download the .js and .html files from the phantom-scripts folder, and start it by going to the Terminal and typing

/path/to/phantomjs /path/to/subtitle-server.js

Then go into Wirecast and tell it to connect to the Motion JPEG stream on localhost port 8081 like this (click image to enlarge):



You’ll see we’ve specified /IHaveADream in the URL, which is the name of the demo that streamtext make available for testing. For your event, you would change that to be your event’s specific streamtext name. We use port 8081 because that’s what I chose to put in the subtitle-server.js script, but you can change that script to listen on any port you like.

If you scroll down slightly in Wirecast’s Source Settings window, there’s a “Connect” button. Connecting will make the Terminal window where you started phantomjs print “Opening embed”, and after around 20-30 seconds you’ll find Martin Luther King’s famous speech appearing in Wirecast. Do be patient: it takes a little while for Wirecast to settle down and accept the stream from PhantomJS.

You can then create a new shot, with your camera feed as the primary source, and the chroma-keyed Web Stream Source shot overlaid (I tend to use around 80% opacity). Voila!

Caveats: this is very knocked-together software. For instance, if you want to change the streamtext event you’re using, the best bet is to change the Stream Source URL in Wirecast, quit Wirecast, stop phantomjs, relaunch phantomjs, and then start Wirecast again. This is because really it’s only good for the phantomjs script to have one connection at once, but it never closes old connections until it’s stopped. Pull requests on github welcome.

Also, do bear in mind that for any method, your Streamtext event needs to be live before you load up a webpage embedding it. Their embed system doesn’t always seem to be very good at switching from closed to live without you reloading the page.

Step 3. Post-production captions

So you’ve done your live subtitling. Now you probably have a recording of the event that you want to put online for posterity.

Luckily Stagetext will, for a fee, record the live captions, tidy them up, and create .stl (or .srt) files which can be embedded as a separate track in mp4 files that are supported by most players, including YouTube.

Creating a subtitle track is really complicated to do: lining up all the subtitles with the speech in the video is a time consuming and laborious process, so this doesn’t come cheap. We think it’s well worth it though.

And remember, you can get a transcript created from the live subtitles very inexpensively, which should be your very minimum aim.

What does all this cost?

Capital costs of streaming equipment (you may have these already):

  • Decent Mac to stream from (I use a quad-core i7 iMac): £1200
  • Wirecast Pro: approx £700

Costs for the live subtitles. Bear in mind that every event is different, and these prices can vary depending on a variety of factors. For a proper quote, contact Stagetext.

  • Live subtitles: typical example is £225 for a 60 minute event, or around £1000 for a whole-day event.
  • Captions after the event embedded as a .stl or .srt subtitle track in your recording: £4/programme minute
  • Transcript from live subtitles: £50 for a 60 minute event, or only around £100 for a whole-day event.

All prices ex VAT. I’ll say it again, every event is different: you must phone up for a proper quote.

Now to me, that sounds like an absolute bargain if you consider the increase in reach of the live event and the increase in the utility and future-proofing of your recordings and archives.

Step 4. Keep improving access

Just doing subtitles isn’t enough. Be aware you also need the following.

An accessible venue

Goes without saying, really. And remember, an accessible venue doesn’t just mean wheelchair ramps. It means layout, signage, facilities, the ability to meet a range of dietary requirements, information on accessible travel, accommodation and parking, and well trained and welcoming staff. The most important of these is probably having friendly staff who have the time to understand, care and help.

Audio Description

This is almost impossible to do live, since it requires very precise timing so as not to clash with the speaker. Luckily most presenters didn’t use any powerpoint at all, but for those who did use visuals we tried to get their presentations in advance of the conference, and made audio descriptions of the ones we received. These soundfiles then went on our website at We’re aiming to make sure all visual presentations are audio described by the time our official recordings of the conference go online.

Sign language

Subtitling is all very well, but it doesn’t meet the needs of all deaf people. Sign Language is not just subtitles done through the medium of handwaving: it’s a whole language, with all its own nuances and idioms. The person who signs is rightly called an Interpreter, because they are interpreting from one language to another. People who require sign language really do require it: they will understand the content far more than by trying to follow the text captions.

Luckily, providing it is pretty easy to do. Hire an interpreter or two to be in the room for your physical audience. Train a camera on them. Light them well. Do a third stream (on top of the no-access and subtitled streams) with the interpreter picture-in-picture. BUT…

Here’s a story: we screwed up the BSL (British Sign Language) stream for No Boundaries this year. This wasn’t due to negligence, but since a massive number of other vital things – often out of our control – took far too long to fix and test, we never got to test the BSL lighting and camera setup. This turned out to be a problem, because the camera we used this year turned out to be much worse in low light than last year’s camera, and so the interpreter was far too dark in a lot of our stream. Then the tripod of the BSL camera collapsed so a load of it ended up at a funny angle. Then we found the recording had a couple of gaps in it. With judicious post-production we could probably salvage 90% of the BSL.

What have we done in the light of all this? We’re going to hire an interpreter again, and re-shoot the BSL for the entire two day conference. It really is that important.

Of course, we wouldn’t have to reshoot if we’d spent more time communicating with the BSL interpreters, getting things set up and tested well in advance of the day, and had a backup plan for this aspect (we had many backup plans for other parts of the conference, but not this). Please learn from our mistakes.

Don’t assume you’ve got it right

Accessibility is vital, but it is hard work and not always obvious. Always consult with someone who knows what they’re talking about. We kept in constant contact with Jo Verrent from Unlimited. She helped us understand where to concentrate our efforts, and what needed fixing. She has, for instance, proof-read this blog post and corrected a sentence implying that BSL is merely preferable for some to the fact that it’s absolutely vital for many deaf people. (I also got corrected by Deepa from Stagetext on the same point!)

Overall we think that most of the access elements worked better this year – accessible accommodation information, dietary needs, physical access issues, staff training, variety/diversity in relation to contributors, audio description of presentations with images and the captioning provided by Stagetext. The BSL – for all the reasons above – did not, but we feel our direction of travel in relation to access is a good one.

There’s always more to do, but you have to start building a good foundation for access before you can improve. Please, start building that foundation at your next event.