Live from SMPTE: assessing the potential of object-based audio

Object-based audio continues to gain interest as a next-generation format both in the movie theater and the home viewing environment. But it is often difficult for broadcasters to understand just what the implications are for not only their operations but also their viewers (and listeners).

Robert Bleidt, general manager of Fraunhofer USA Digital Media Technologies, laid out some of the potential for the format as part of the next-generation MPEG-H format during a presentation at the SMPTE 2014 Technical Conference and Exhibition.

The video portion of MPEG-H will make use of HEVC or High-Efficiency Video Coding to allow for services like 4K to be delivered in much less bandwidth than they are today. So when Bleidt speaks with consumers who have purchased a 4K set he moves the discussion beyond just pretty pictures.

“When I ask them if they would also want a new audio system that would allow them to turn the announcer up or down they say ‘Wow! Can you let me do that? That is a great feature and I would want that now,’” he said.

Fraunhofer has already done experiments on the ability to control the relative volume of the announcer track and the effects track for sports productions. Three years ago at Wimbledon it worked with the BBC on a system that allowed viewers to adjust a slider and change the audio levels.

“There are a couple of benefits as someone may have a little bit of a hearing impairment and the BBC did a splendid job of splitting the difference so that half of the viewers wanted the announcer volume higher and the other half lower,” explained Bleidt.

This year a trial in the US for a NASCAR race added in a new wrinkle, allowing the user to listen to pit crew radio for favorite drivers. Other possible applications include different languages, home or away broadcasters, the stadium announcer, or even audio description or simultaneous translation.

Bleidt also addressed the difference between object-based audio for the cinema and TV. Cinema is about spatial accuracy and dynamic motion, but for TV the most powerful application is for interactivity.

“Film directors don’t encourage that, but for TV they can allow viewers to make their own mix,” he said. “Viewers are no longer couch potatoes as they want the audio tailored to the platform they are using, like a tablet.”

But the more immersive viewing experience of 4K, which calls for the viewer to sit closer to the set than they did for HD (or have a larger set at the same distance), also requires a wider spread of audio as the viewing angle for 4K will be wider.

“So how many more channels will be needed at home to give a sense of immersion?,” he said. “Our tests showed that four height speakers provide a substantial improvement so we propose 5.1 or 7.1 Surround Sound layout plus four height speakers [on the ceiling].”

A new paradigm for mixing?

The wider field of view in 4K may call for a completely new mixing experience during live events: audio panning to follow the video. The challenge, however, is that 99 percent of consumers still listen to stereo or, at best, a 2.1 Surround Sound bars. Yes, the one percent who invest in a full Surround Sound system may be the most desired income group and the most passionate TV viewers, but that does not mean serving them makes sense.

“We have come up with an alternative way to get consumers to experience audio and it is a prototype 3D soundbar that can hang on the wall and, at the push of a button, provide immersive sound,” explained Bleidt.  “We can also correct for misplaced speakers.”

How exactly does the broadcast world get to this brave new world of object-based audio? Bleidt said it would take four stages.

Stage one is to replace AC-3 encoders with new MPEG-H encoders (a move that Bleidt said could happen at the same time HEVC encoders are adopted). “That means the same audio quality for half the bit rate and also some improved operational practices like automatic loudness for tablets,” he remarked.

Stage two requires adding audio objects like second or third dialogue tracks, audio description, home and away announcers, and even things like the song playlist for an athlete. Adding in mono objects will only require an additional 20-40 kbps per object.

Stage three is adding in the immersive sound component via four channels of audio that adds a sense of height. That will also require four additional audio channels in the plant.

The final stage is to add in dynamic objects that change position over time, like sound effects in a sports broadcast.

The goal, in the end, is to not only open up new experiences but also possible new revenue streams. “There are a number of creative and perhaps revenue generating ideas that we have not even though of that are possible,” he added.

Subscribe and Get SVG Europe Newsletters