Next Generation Audio Summit 2016: Florian Camerer discusses OBA and immersive audio opportunities in scene-setting keynote
In a striking keynote address delivered at the start of the inaugural SVG Europe/Dolby Next Generation Audio Summit on 10 November, ORF Senior Sound Engineer Florian Camerer elaborated upon the multiple new technologies and techniques set to revolutionise the nature of broadcast audio production over the next few years.
A leading figure in the broadcast audio community for a quarter of a century, Camerer joined the Austrian Broadcasting Corporation, ORF, in 1990. Five years later he became a staff sound engineer, primarily in the field of production sound and post-production, before going on to cultivate an interest in multi-channel audio. He mixed the first ORF programme in Dolby Surround, and continues to take an involvement in all aspects of multi-channel audio. He is also chair of the EBU group PLOUD and is a popular lecturer on surround sound and related issues around the world.
For his Next Generation Audio (NGA) summit keynote Camerer began by reflecting on a period of phenomenal change – one in which “all these acronyms are flying about” and many people are “very excited” to find out more about the potential of NGA technologies. Object Based Audio (OBA), Immersive Audio, Virtual Reality Audio, Augmented Reality Audio and Mixed Reality Audio were the specific technologies cited as being part of the NGA grouping.
But as Camerer also wondered, is there anything wrong with what we might term PGA – in other words, Past or Present Generation Audio? “It is 5.1 and a lot of stereo, but [in particular] 5.1 at present. Now 5.1 can definitely be sweeping and compelling and engaging if it is done correctly, but it does have a few issues [that need to be tackled with] NGA – specifically it is not 3D. It lacks the height [dimension].”
Consequently, Camerer says that he has “rarely heard productions where I felt ‘inside’ right away. They often weren’t too convincing [in terms of] envelopment.” All of which inevitably means that there should rightly be some caution when contemplating the implementation of NGA: “We have not fully managed 5.1 yet and now we want to embark upon NGA…”
OBA origins and hybrid approaches
When moving into this brave new world it is important to note, says Camerer, that “since the beginning of mixing [there has always been some generation of metadata] with rendering. With OBA what we have are all the channels being kept separate [rather than mixed together on the production side], then rendered according to the metadata sent.”
One of the “hot issues” surrounding OBA is the issue of how “the renderer will behave [with regard to complex mixes] and how these can be standardised on the playback side.” But whilst this might be a challenge, OBA is certainly commended but its ability “to adapt to different playback environments – 5.1 or 9.1 or 4.0 – or some kind of binaural rendering for headphones.”
Whilst ‘pure’ OBA might be one way to navigate this latest era of audio, Camerer also drew attention to the rise of hybrid approaches, where “we only have a very few objects where it makes sense to keep the information separate, and then [have everything else] in the form of beds where we don’t have to change much.”
In the light of high-profile audience complaints over the intelligibility (or otherwise) of dialogue in some major TV drama series, “the ability to increase or mute dialogue” is one obvious potential beneficiary of OBA or hybrid deployment. But what is clear above all is that there won’t necessarily be one defining approach to OBA: “It could be immersive, but not necessarily. It could be stereo with one object and the object is the dialogue, but it could also be immersive [and making use of the codecs] that have now been standardised to offer OBA.”
Camerer proceeded to update delegates on the provision for NGA in recent standards developments, including the May 2016 publication of EBU R 147. This recommendation – which can be read in full here (https://tech.ebu.ch/publications/r147) – formalises the EBU’s position that the renderer and codec, including bit-stream and associated technical and descriptive metadata, be specified and standardised as part of Next Generation Audio systems based on audio objects.
Different deployments and VR views
Whilst the deployment of NGA may potentially be highly complex, it is also evident that more straightforward approaches can be taken. For ORF’s recent immersive production of the New Year’s Day Concert – an annual musical celebration featuring the Vienna Philharmonic Orchestra – “we just used four additional microphones to what we do for 5.1. The result is that an aural 3D track was included on the Blu-ray discs of the last two New Year’s Day concerts, so if you have an aural decoder system at home you are able to listen in 9.0.”
In such instances “the actual effect on production can be quite small; it can be relatively easily done. But for others it is not so straightforward.”
By way of example, Camerer pointed to the current experiments being undertaken by NHK around 22.2, as well as the “very deep mathematical” understanding that is likely to be required for success with audio for VR. In some quarters it is currently being suggested that an adaptation of Ambisonics – the full-sphere surround technique originally developed during the 1970s under the auspices of the British National Research Development Corporation – could “solve all the problems” for VR, but “those people are wrong [as] it cannot live up to the envelopment that you get with [distance mics].”
Having recently attended a VR conference in Los Angeles, it is clear that “there is a great deal going on” in this domain, but that some of the potential deployments to capture VR audio are “becoming very expensive”. The need to deliver personalisation for the listener – “everyone has a separate set of ears and [will want to benefit] from the most convincing representation” – will also require substantial research and preparation. Potentially, though, services that involve the uploading of images of users’ ears and the extrapolation of the relevant data to achieve personalisation for VR could be ready to go as early as next year.
It was only fitting that Camerer concluded his presentation by noting that audio is “moving really fast these days”. Keeping up is going to require phenomenal energy (and substantial resources) on the part of broadcasters, service providers and consumer manufacturers, but the benefits for viewers are likely to be profound.