AES 2015 Reflections: Fraunhofer USA’s Robert Bleidt discusses the ‘four phases’ of immersive audio adoption

Fraunhofer USA's Robert Bleidt pictured with Jünger Audio’s MPEG-H Authoring and Monitoring Unit. (Photo: David Davies)

Fraunhofer USA’s Robert Bleidt pictured with Jünger Audio’s MPEG-H Authoring and Monitoring Unit. (Photo: David Davies)

With Genelec among the manufacturers to be hosting 3D or ‘immersive’ audio demos, and SVG’s DTV Audio Group meeting devoting several hours to the topic, Next Generation Audio (NGA) was once again at the forefront of discussion at AES. While approaches to NGA are the subject of significant variation, the overriding trend towards giving viewers the opportunity of a richer, more all-encompassing sound experience is evident for all to see (and, more to the point, hear).

For German research organisation Fraunhofer, it is the MPEG-H standard that is leading the way to a more immersive audio future. As part of the MPEG-H Audio Alliance alongside Qualcomm Technologies and Technicolor, Fraunhofer is helping to spread the word about a system that will deliver comprehensive object-based audio. The result is that viewers will be able to adjust the sound mix to match their own preferences, for example boosting hard-to-understand dialogue or creating a ‘home team’ mix for sports broadcasts.

During an edition of AES that saw Fraunhofer participate in an MPEG-H-based spatial audio demo, SVG Europe sat down with Robert Bleidt – who is division general manager of Fraunhofer USA Digital Media Technologies – to discuss the probable time-scale for widespread roll-out of immersive audio and the extent to which it is likely to achieve mass adoption by consumers.

Where are we ‘at’ now with regard to making immersive audio a reality?

The first thing to say is that if you take an incremental approach to moving into these new audio systems, you don’t have to learn anything new immediately; rather, it can be an incremental change. We see it as being a four-phase process. Phase one is you just produce what you are doing today, but change out the codec and benefit from bit-rate savings. The second is where you can begin to separate certain audio elements and send them as independent objects to the viewer at home. It’s the point where the production planning and thinking has to change a bit, but it doesn’t require buying any new equipment other than a video encoder that has MPEG-H in it.

The next step is where you start to think about, for example, sending the home team announcer to channel 8 or the Chinese announcer on channel 12. This is where it becomes a bit more complex and you require more coordination and automated tools so you don’t have to communicate this information manually to other parties. The final phase is to add overhead speakers [at home] to achieve immersive audio.

There has lately been a move to supplant terms like ‘immersive’ or ‘3D’ audio with ‘next generation audio’ (NGA) – but is it actually a useful description?

I think it is. Everyone active in this area has been experimenting with the correct nomenclature for these systems. We started out calling our system ‘immersive and interactive’ and that does describe what we are doing. We opted to stay away from ‘3D’ because of the bad association with 3D in the video world.

More than anything, though, what we are about is ‘personalisation’ – letting the user at home choose what audio they want to hear, either as a pre-defined mix or via their own mix if they are an enthusiast.

NGA has been established to be an umbrella term for these various technologies, and it is useful [in terms of raising awareness] although of course it doesn’t really speak to the individual features.

How long do you think it will be before technologies such as MPEG-H are a daily default for broadcast production?

Well, there is obviously a whole chain of people involved in getting this to the consumer – not just the MPEG-H Alliance, but also the broadcasting networks, US affiliates, cable and satellite operators, TV set manufacturers, etc. [In terms of where we might see early adoption] MPEG-H has received a lot of support in the Korean market, and we know that the 2018 Olympics will be taking place there. The ATSC is also working to a similar timeline [of 2018] in the US.

Do you think that people will want to activate immersive audio on a daily basis – or will it be more for the major tent-pole events such as World Cup or Grand Slam tournament finals?

That is difficult to predict. I think it will be an evolutionary process, just as when we went from stereo to 5.1. [And the same applies to the actual production process] with OB trucks able to add a product like Jünger Audio’s MPEG-H Authoring and Monitoring Unit, along with four small speakers to be able to do immersive audio. You don’t have to go out and buy a whole new console.

With any new technology there is a long process of [education] to ensure that the day-to-day operational people understand the changes. The standardisation processes taking place now will obviously help with that, and so I think we have a very exciting few years in front of us.

Subscribe and Get SVG Europe Newsletters