HOWTO: Live captions/subtitles on a live stream

Part of the aim of the No Boundaries conference (produced by Watershed as part of consortium of organisations, and held simultaneously in Bristol and York) was to bake-in accessibility from the start. One of the biggest challenges was to have live subtitles both in the venues and as part of the live online stream. Luckily, it turned out to be remarkably simple. Here’s the setup:

We fed the live audio from the performance into a computer, which in turn relayed it (via Skype, alas[1]) to the stenographers who did the incredibly talented work of speech to text. We used Stagetext for this service, and I have to say they were amazing to work with. They are also NOT EXPENSIVE. I repeat, it is not expensive to use this service. There is no excuse for you not to do this for your own streams and thereby include a wider audience (not just hearing-impaired people, but also international viewers for whom English isn’t a first language). Seriously. Do it. Anyway…

The stenographers’ software integrated with a web service called streamtext.net. Streamtext simply gives you an embeddable iframe which you can configure to have various font sizes, colours, line spacing, etc. For the rest of this post, I’ll put their demo “I Have A Dream” subtitles into the sample code.

In the auditoria themselves, we set up plasma screens, set to around 1280×720 or 1360×768 ish, which loaded the following HTML into a fullscreen browser:

<html>
<head>
<title>Captions for NB2014</title>
<style type="text/css">
body {
padding:0;
margin:0;
background-color:#000;
}
#wrapper {
overflow:hidden;
background-color:#000;
width:1280px;
height:720px;
margin:0 auto;
}
#captions {
height:430px;
margin-top:100px;
width:1296px;
overflow:hidden;
}

</style>
</head>
<body>
<div id="wrapper">
<iframe id="captions"
src="http://www.streamtext.net/player?event=IHaveADream&ff=Verdana,sans-serif&fs=80&fgc=ffffff&bgc=000000&spacing=1.25&header=false&controls=false&footer=false&chat=false" frameborder="0" scrolling="no"></iframe> </div>
</body>
</html>

You might spot that a couple of things are odd about that: the problem with Streamtext is that it will put scroll bars into its main div, so we have to do some adjusting with CSS to shunt that off-screen. We also have to make sure the iframe height is exactly right to fit four lines of text in without getting lines chopped off halfway up.

How did we embed this on the live stream? The answer lies in our streaming setup…

From our encoding computer (which was encoding at 720p), we sent the stream to an in-house Wowza media server with the transcoder add-on (it costs around $60/month to rent Wowza). Initially, the idea of the transcoder was that we could encode for multiple bitrates, to cater for the variety of broadband speeds that viewers were likely to have. However, Wowza transcoder can also overlay images, plus it can continuously check to see if an image file has changed.

Therefore I set phantomjs going, with the following script, to save off a png of the Streamtext page every half a second:

var page = require('webpage').create();
var output = "/path/to/subtitles.png";
page.viewportSize = { width: 1300, height: 215 };
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.75 Safari/535.7';
page.clipRect = {
top: 0,
left: 0,
width: 1280,
height: 200
};
page.open('http://www.streamtext.net/player?event=IHaveADream&amp;ff=Verdana,sans-serif&amp;fs=50&amp;fgc=ffffff&amp;bgc=000000&amp;spacing=1.2&amp;header=false&amp;controls=false&amp;footer=false&amp;chat=false', function () {
window.setInterval(function () {
page.render(output);
}, 500);
});

Again, you’ll see the clipRect doing the job of removing the scroll bars, and this time the page height is set to perfectly capture three lines of text (just right for overlaying onto the stream).

We then put the following code as one of the <overlay> sections in a Wowza transcoder block (we actually did it at multiple bitrates, I’m just showing you the 480p one as an example, hence the resize to 854×133). Note the <CheckForUpdates> section which makes it check every 0.75s to see if the image has changed.

<Overlay>
<Enable>true</Enable>
<Index>0</Index>
<ImagePath>/path/to/subtitles.png</ImagePath>
<CheckForUpdates>true</CheckForUpdates>
<Opacity>60</Opacity>
<Location>
<X>0</X>
<Y>0</Y>
<Width>854</Width>
<Height>133</Height>
<!-- horiz: left, right, hcenter - vert: top, bottom, vcenter -->
<Align>hcenter,bottom</Align>
</Location>
</Overlay>

We could therefore encode the subtitles straight into the live video. This also meant that, since we recorded the streams so there was a temporary archive before a proper edit of the event was done, we could record the subtitles too.

The result looked like this:

Screen shot 2014-02-27 at 19.53.30

We’ll post more on the architecture of the streaming setup (using an in-house origin server with devpay-licensed EC2-based edge servers) within the next week or two.

No Boundaries is a State of the Arts event supported by Arts Council England and British Council.

[1] We were very much trying to keep the conference a Skype-free zone. Not just because it’s proprietary, and not just because it’s dreadful quality – at least for video – but mostly because the word “Skype” is beginning to be a synonym for “videoconference”. It’s not. There are many better solutions, as I’ll document in future posts. Please, everyone, at least consider the options before plonking for Skype. In the end, nothing the public saw was done via Skype: we merely used it twice in the back end, firstly as an IM backchannel to cue Alice Greenwald who spoke to us from NYC, and secondly to send audio to StageText. This was because Skype was what these people ordinarily use, so we decided it was kinder to integrate with them.