On the edge: How metadata and 5G are transforming the way viewers engage with sports

Rob Oldfield: “I am very interested in is what 5G can enable us to achieve which otherwise couldn’t be done”

The metaverse is a funny thing. Few people really know what it is, how it works, or what it will turn into, but many hope it will change how we interact with technology and how we connect with the things we love.

Last year’s 5G Edge-XR project won lots of plaudits, even the technical paper written about it won big, taking home the IBC2022 Best Technical Paper prize.

The headlines were all about how the project delivered extended reality (XR)-enabled experiences across a range of different sports and services. But for Salsa Sound, which was the audio partner in the project, that was not the point at all, because it also demonstrated how normal folk could connect to immersive virtual spaces in a simple way and on standard devices. It literally promised the metaverse in the palm of your hand.

According to Salsa Sound co-founder and CEO Rob Oldfield, it was about demonstrating what 5G brings to the party and how it empowers everyone to get involved.

“As an audio guy I’m less interested in XR, but what I am very interested in is what 5G can enable us to achieve which otherwise couldn’t be done,” he says. “For us, that’s always been the point. The stumbling block for many immersive and XR experiences has always been whether the user device is powerful enough and whether it can manage all the requisite data, and the answer is always the same. The answer is always no.”

The 5G Edge-XR project was born on the promise that 5G can deliver more than just faster broadband. The project looked to present enhanced broadcast use-cases across football, boxing, MotoGP and rugby, as well as live interactive educational content.

Bringing together a broad selection of specialists, it was funded by the UK Department of Culture Media and Sport (DCMS) and overseen by BT Media & Broadcast, along with BT Sport. Salsa provided 3D audio objects and worked alongside Condense Reality (which took on the volumetric video aspects), The Grid Factory (edge compute), Dance East (education) and Bristol University, who also worked with Condense to provide cleanup of volumetric scenes.

The aim was to create complete 3D representations of scenes using video and audio, which a viewer can navigate around. Both video and audio have 3D coordinates attached to them, and both update in real time when they are locked together as a scene.

“The most accessible way to process it is to use a gaming engine, and we used a Unity gaming engine as an edge compute running on a cloud XR server to run the volumetric video,” explains Oldfield. “Our role was to create audio objects which were stitched together with the video as a scene.

“A phone can’t deal with a Unity scene. You can’t stream data fast enough, the phone can’t hold all the information it requires, every user would need to download the Unity app and the battery would run down too fast. It’s not possible to do any of this without a server and a very large cable. In this case, the server was an edge computer in the cloud and the cable was a 5G network, and in this way we showed that 5G can enable XR experiences on off-the-shelf end-user devices.”

XR access for all

Officially the project aim was to explore, “how high-quality augmented and virtual reality immersive experiences could be broadcast to audiences with consumer AR/VR headsets, smartphones and tablets, using cloud-GPU to render XR presentations delivered over 5G networks. The goal was to democratise access to XR experiences by reducing the need for heavy processing on end-user devices”.

The term democratise is over-used, but this is exactly how it played out. Every user was assigned a virtual machine to connect to in the cloud, which might not be the most efficient way to do it but proved the model in the most intensive way.

“XR is extremely data heavy and highly susceptible to latency which made it a useful use case as it drives everything as hard as it can be driven,” he adds. “Connecting to a GPU-intensive computer which has artificial intelligence (AI) processing in the cloud enabled us to do more than we could ever do with on-prem hardware, and 5G enabled this computer resource to be used.”

Object focused

The project also provided Salsa with the opportunity to push the limits of its object-based approach to audio. Since the company launched in 2017 with its AI-driven MIXaiR product, Salsa has adapted how it uses metadata to create audio objects to a variety of environments, from immersive audio to personalisation. Every single implementation is driven by metadata, and once it’s added to the audio it can be adapted to create any number of experiences. The range of sports in the 5G Edge-XR project helped clarify this as every use case had a different motivation.

Oldfield says: “MIXaiR only scratches the surface of what we can do with objects and with AI. All of these experiences, whether it is XR or personalisation, are driven by objects created by metadata. It’s an approach which is gaining traction under the banner of personalisation because accessibility is an obligation for many broadcasters, but whatever the end result – accessibility, immersive or XR – metadata is the driver.”

In the 5G Edge-XR project, for example, MotoGP was geared towards second screen use with a high-resolution 3D map showing the location of riders on the race circuit and an area with life-size virtual MotoGP bikes which fans can walk around and inspect. The audio was less centred on the individual. Meanwhile, boxing presented close-up volumetric video where each audio object was localised to its position in the scene, so the sound of each punch could be positioned relative to the viewing angle of the viewer.

“The Unity engine and the metadata creation is the same for the football as it is for the boxing, as well as the dance lessons, MotoGP and the rugby,” adds Oldfield. “How the audio interacts with the video is different but the workflow is exactly the same; the objects empower broadcasters to create unique user experiences which amplify the live experience appropriate to the event.

“So for football we created objects of the on-pitch sounds, the crowd and the commentary. For boxing it was the sound of the punches and the footfalls on the canvas and we triangulated them to position them with the video angle. These things make all the difference for viewer buy-in because it places them inside that scene, whether that’s in a crowd in a stadium, or walking around a boxing ring.”

“Personalisation and immersive is the future of broadcast, and we can take advantage of the additional compute resources in the cloud to enable better end user experiences”

It would be unthinkable to expect a live operator to isolate those specific sounds and localise them in real time for broadcast, and impossible to provide end users with the tools to move around the scene if it wasn’t for AI-created audio objects.

Oldfield already knew how important objects were to create new viewing experiences, but what this project proved was how it might be possible with consumer technology that already exists.

“The project was worthwhile because we proved that not only can 5G enable us to do the things we’ve always done, but it can do so much that we don’t currently do. I would love to hear more people in broadcast talking about how these technologies help us to create new experiences rather than doing what we already do in a different way.

“Personalisation and immersive is the future of broadcast, and we can take advantage of the additional compute resources in the cloud to enable better end user experiences, not just maintain the status quo.”

 

Subscribe and Get SVG Europe Newsletters