Building an Online Video Platform for an Arts Organisation: [Part 2] Wrangling Metadata

This post is the second in a series about developing an online video platform for an arts organisation. Over the last eighteen months Watershed have been redeveloping the way they ingest, store and share video content, and this series of posts shares some of the technical challenges we have faced. Our second post is all about Metadata:

Producing a ‘Video Stack’ API for Wrangling Metadata

If there is one simple statement that encapsulates our work on video embedding over the last eighteen months it is this:

‘Built systems around your content, not content around your systems.’

We’ve learnt this lesson the hard way because over the last decade our content was, unfortunately, built around our systems. Our videos were scatted across different domains, metadata (information about the videos) was missing and distributed across different databases, and our databases were horrifically over engineered and underused. But, most annoyingly, the metadata we did have wasn’t consistent. This, of course, is a common symptom of working in challenging environments with limited resources and constantly changing goal posts.

To solve this problem – so we could wrangle the metadata into something useful – we had to think about how we work, rather how the technology works. The wrong way we could have approached the mess was to redesign a new, monolithic, super database, which replaced all our old legacy databases. Given limited resources, and our ambitious nature, this would have been doomed for failure, and added yet another layer of over engineered complexity to the mess.

Instead, we had to first accept that we are always going to have different databases with inconsistent fields and there was no way to replace them or stop more from being added to the tangle. We needed to embrace them.

Enter the ‘Watershed Video Stack’:

To solve our messy tangle of doom we designed a software stack, containing special ‘Abstract Layers’, which attempt to unpick the mess and allow us to access all our video assets in a consistent manner. It does this through two special layers: the ‘Translation layer’ and the ‘Unified Presentation Layer’.

Video Stack: Translation Layer

The ‘Translation Layer’ is a layer of Object Orientated software, which hooks onto all our databases and breaks down the metadata spaghetti into small consistent objects (imagine them as Lego blocks of information that can be pulled apart or stacked together as needed), using a programming technique called Polymorphism.

Consistency is important because fields in one database could be called something completely different in another – which means systems aren’t interoperable. The Translation layer takes all the video metadata from all the different databases and converts them into small objects which all look the same and have the same fields: title, description, running time, creator etc. And, because our metadata now conforms to a standardised interface we can actually interrogate our metadata properly – it becomes searchable!

Video Stack: Unified Presentation Layer

On top of the Translation layer is the ‘Unified Presentation layer’. This does two things: firstly, it takes all of the highly cohesive ‘Lego Block’ metadata produced from the layer below and aggregates (stacks) them all together into one place. Secondly, it provides an API (the Watershed Video Store API) for applications to access all our metadata via simple commands. These commands allow us to interrogate our video data however we like – we can prod it, search it, or even select random bits of it. And, because the Video Stack is a modular system, we can add and remove the underlying databases (however messy they are) without affecting any other parts of the system.

However, the real black magic is that from the Video Store API’s point of view our metadata is consistent and complete even though it isn’t! Because of this we can build video content into multimedia rich Web Applications simply and efficiently, and without worrying about the legacy burdens! In post three we will be looking at how we’ve built useful Web Applications, using these new technologies.

Written by Richard Grafton

Richard is Senior Developer at Watershed, where he supports and develops Watershed’s digital presence and programmes. Before working at Watershed, Richard was a research technologist at BBC R&D, and the Media Technology Lab at the University of Sussex. He holds a BSc (Hons) in Multimedia and Digital Systems.