White Paper: Fraunhofer IIS on delivering a complete suite of solutions for the next generation of ‘virtual reality audio’
While Fraunhofer IIS technologies continue to enable the services of today, its multichannel and immersive audio capabilities are also being implemented in the services of tomorrow. In fact, it is already facilitating audio for virtual reality (VR) and has been doing so for some time.
For example, the Samsung Gear VR, the LG 360 VR head-sets and the Hulu VR app are shipping with Fraunhofer’s HE-AAC audio codec and Fraunhofer Cingo, its acclaimed binaural headphone rendering product, today.
The stakes for VR audio are arguably as high as they are for any other current platform. Audio is an integral element of VR delivery, but if it is produced or rendered insufficiently the illusion for the participant is ‘broken’ and the overall experience gravely compromised. But if it is delivered correctly – in other words with core requirements such as accurate binaural rendering and astute deployment of head-tracking – it can make for a truly immersive and enriching experience.
On many fronts, Fraunhofer IIS technologies are showing the way ahead for VR audio. This involvement starts at the point of capture and runs all the way through delivery and distribution to the consumer.
Producing stunning immersive sound
While the availability and cost-efficiency of 360 cameras is improving all the time, it is generally the case that their onboard internal mics are often unable to capture true 3D audio. Meanwhile, existing external 3D audio microphones incline towards the expensive, are difficult to hide in a 360 video shooting, and usually require specialist training for optimum results to be achieved.
Therefore, it stands to reason that a great deal of research is currently taking place with the central objective of making 3D audio capture for VR more straightforward. At Fraunhofer IIS, this activity revolves around recording with only a small set of microphones combined with an intelligent algorithm which creates stunning 3D immersive sound.
But 3D audio recordings are just the starting point for the production of a compelling immersive sound experience for VR. Content producers also need tools that enable an easy and intuitive way to mix and audition what consumers will hear on VR. Therefore, Fraunhofer IIS provides VR audio plugins for Digital Audio Workstations, such as ProTools or Nuendo (AAX and VST), which allow the creatives to mix 3D sound for VR in a familiar environment without the need to adapt to completely new ways of content creation. The plugins support audio channels and objects as well as ambisonics, scene-based audio elements. These post-production plugins are tailored for professional mixers in the cinematic, post-production and other VR communities. Once the audio mix has been produced, a Fraunhofer plugin for the gaming engine Unity can be used to integrate immersive audio into the applications for VR devices.
Delivery today with HE-AAC audio codec
The efficient delivery of audio content to mobile devices has always been one of the biggest strengths of the technologies developed at Fraunhofer IIS. In fact, the predominant audio codec for the delivery of stereo and surround audio content today to mobile devices and VR is the AAC family of codecs. Primarily developed by Fraunhofer IIS, it can now be found in more than eight billion devices worldwide. VR platforms such as Hulu VR, Samsung Milk VR or YouTube 360 use this codec to deliver high quality surround sound even at low bit-rates. However, for the delivery of truly immersive sound for VR, a next generation audio codec is needed: MPEG-H Audio.
MPEG-H Audio: a major tool for the new era of immersive audio
MPEG-H Audio became an international standard in 2015, and now provides the basis for an audio system that is a candidate for inclusion in the new ATSC 3.0 and DVB TV broadcast standards.
The versatility of MPEG-H Audio makes it ideal for immersive and interactive productions. With this coding technology, next generation audio services can be delivered easily and cost-effectively to mainstream consumers and enthusiasts alike.
The MPEG-H Audio System is the perfect fit for VR Audio since it can carry audio channels, audio objects and ambisonics audio plus metadata within a single audio bitstream. Audio channels provide a great way to deliver immersive audio while dynamic audio objects enable directors to use audio cues to guide the audience through 360 stories. In this context, they also allow for a very precise placement of sound events around the listener’s head and at a specific distance from the ear. Ambisonics audio on the other hand provides a convenient way to capture immersive audio scenes, for instance on set, plus they make it easier to realise point of view manipulation and head rotation at playback.
Playback over headphones with Fraunhofer Cingo
In tandem with the delivery of immersive sound to VR devices using MPEG-H Audio, Fraunhofer Cingo can be used for the authentic and realistic reproduction of the 3D sound scene over headphones. Cingo supports rendering of fully immersive 3D audio content with formats that add a height dimension – including 5.1+2 height speakers, 7.1+4 height and similar channel configurations – and allows the simultaneous placement of sound objects anywhere in the virtual space around the listener. This allows the listener to feel fully immersed and makes Cingo particularly suitable to the emerging VR segment.
Cingo has already made a global impact with the integration into the Google Nexus family of devices and is now making headway in the new world of VR thanks to its integration into Samsung’s groundbreaking Gear VR mobile virtual reality device, in LG’s brand new LG 360 VR HMD and in the Hulu VR app.
First announced in September 2014, Gear VR was developed by Samsung in collaboration with Oculus VR. A compatible device, such as the Galaxy S7, acts as the headset’s display and processor, while the Gear VR unit itself contains high field of view lenses as well as a custom IMU for tracking which connects to the smartphone via micro-USB.
LG introduced the LG 360 VR at Mobile World Congress 2016. Thanks to the combination of Cingo and HE-AAC, movies in LG 360 VR applications will immerse users and transport them in different virtual environments with perfectly matching audio properties for the success of the illusion.
Cingo helps to deliver an audio experience for Gear VR and LG 360 VR that is every bit as dramatic as the visual component. The specific features of Cingo that make it suitable for mobile VR include: the capability to render 3D immersive audio from all directions; the possibility to accurately place and track audio objects anywhere in the virtual space; and optimisation for mobile platforms from the ground up, making it suitable for any VR platform, whether mobile or tethered to a PC or gaming console.
Indeed, it should be noted that the amount of content available for head-mounted displays such as the LG 360 VR or Gear VR is now increasing rapidly. For instance, Samsung’s Milk VR service plays host to a huge variety of 360 degree videos, while prospective content creators can access comprehensive guidelines at the milkvr.com website.
There are also indications that the combination of Gear VR, MilkVR and Cingo will provide a suitable pathway for live streaming experiences on VR. Sports is one of the areas in which the most rapid adoption is to be expected.
Fraunhofer faces the future
As the transition from mono to stereo to 5.1 was protracted and hardly without its fair share of challenges, so it is reasonable to expect that the mass adoption of immersive and interactive audio services will take significant time and patience. But in HE-AAC, Cingo and MPEG-H Audio, among other developments, Fraunhofer IIS is showing itself to be at the very forefront of these next generation services. These technologies have already been implemented in a number of class-leading devices, and more significant announcements should be expected as 2016 continues to unfold. A brave new world of audio awaits.