Mics as data gatherers: How Salsa Sound is expanding the world of immersive listening for fans outside of the stadium

Rob Oldfield, CEO at Salsa Sound, which is one of the winners of Arsenal Innovation Lab’s backing to develop digital and virtual tools that help Arsenal fans from all over the world get involved with their team

With a wealth of freely available and engaging content across multiple channels, access to teams has never been easier, and clubs are taking advantage. Fan engagement is a big deal, and clubs are putting more time and resources into developing ways fans can get closer to the action.

The Arsenal Innovation Lab is one such initiative. The programme has been investing in start ups since 2017, and this year’s event was a virtual seven week sprint for young technology companies to develop digital and virtual tools that help Arsenal fans from all over the world get involved with their team.

The reach was global and the programme received more than 500 applications. However, just eight were shortlisted to pitch. In late February, Manchester-based Salsa Sound was announced as one of only four winners with MixAir, its artificial intelligence (AI)-driven automatic audio mixing system for live sports.

“For years microphones have mainly been used for recording and helping to tell a story, but microphones are more than that; they are very effective data gatherers”

Salsa Sound describes MixAir as enabling the automatic creation of immersive listening experiences using standard microphone set ups. It not only creates engaging pitch mixes, but also automatically manages crowd, commentary and AUX-in feeds, and can create as many mix variants in whatever formats needed, for whatever requirement is available.

For Arsenal, Salsa Sound’s technology is being used to create an immersive audio mix for match highlights packages that take you inside the ground so you can soak up all the key moments from the game.

“Arsenal is a great club with such a fantastic history and pedigree and we’re honoured to be working with them to bring these innovative experiences to their fans worldwide,” says Rob Oldfield, CEO at Salsa Sound. “Nothing beats watching a game live and experiencing the roar of the crowd, but it is a privilege to be able to give fans that experience even if they can’t physically make it to the stadium.”

Collaborative effort

The technology started as a collaboration, says Oldfield. He explains: “We had a lot of contact with Arsenal and are still honing the proposition. It has required a lot of calls with their technical team to find out exactly what works for them. This is important as we want to provide a product which meets a genuine need rather than push a technology; we want to scratch where the industry is already itching.

“Arsenal’s challenge is that they have a global fanbase of millions and they are passionate about connecting fans to the action on the field; the game is their core proposition and flagship product, and over 99% of Arsenal fans are not able to see the game live.”

A big part of any live experience is what it sounds like, especially a large sporting event. The collective experience and the roar of the crowd generates passion and creates engagement. As the output for the Arsenal experience mix is primarily for fans watching on mobile devices, MixAir mixes use head-related transfer functions (HRTF) to automatically create binaural mixes especially for headphones. This places the viewer in a particular point in the stands and replicates how our ears would work in that exact spot.

At Arsenal’s Emirates stadium, Salsa added surround mics in the stands to compliment the broadcast mics already in the venue. These mics provide additional data to combine with the MADI feed from the broadcast mics to create a combination of the binaural crowd mix and the more visceral sounds of the shouts, kicks and crunching tackles from pitchside. This puts the viewer closer to the pitch at the same time as being part of the crowd.

However, Salsa Sound’s MixAir offers more, notes Oldfield. “At a live match, the crowd – and the sound they make – is actually the most effective and honest metric for measuring excitement levels of a match; it is the collective opinion of 60,000 fans reacting to the action in real time. MixAir constantly monitors the loudness levels of the crowd and creates an XML metadata output which allows us to objectively measure the excitement levels in the stadium in real time.”

When that data is processed it is able to pinpoint where the game is currently at its most exciting. One way clubs can take advantage of this data is by encouraging competition between fans in different areas by analysing sound levels from different parts of the ground. This not only creates more engagement for fans in the stadium, but also boosts the overall sound level in the stadium.

Salsa Sound’s Rob Oldfield having a great time at Arsenal FC

Mics as data gatherers

The way the microphones gather live data means the audio can be used for much more than just recording, a point not lost on Oldfield. He says: “For years microphones have mainly been used for recording and helping to tell a story, but microphones are more than that; they are very effective data gatherers. They can be used to create holistic metadata from the audio to help provide information from speech-to-text, or to ascertain where the action on the pitch is, to trigger graphics or to help provide a focus for post-match analytics.

“The real beauty of it is that it generates all this data and feeds it back with no additional overheads for those data feeds; there’s little to no processing time and it can be derived from the same AI solution in MixAir by using the mics which are already in place.”

As data captured from the microphones can illustrate peaks and troughs in the game, future development could also include using it to help automate highlights packages, while Salsa Sound’s AI also has the potential to give additional options for future content creation.

“One reason for the growth in AI for sound is that broadcasters and clubs are demanding more and more content, and there is more pressure on mixers to produce multiple mixes from the same event. I was in a remote truck recently with a mixer creating 16 separate output mixes! You can’t just go in and ask for an additional immersive mix for fan engagement on top of all that. AI provides the ability to easily create content for a wider range of outputs; it does a lot of the heavy lifting to leave the mixer free to craft the mix in a more creative way.”

Craft versus chase

Traditional Premier League football coverage has a number of shotgun mics which surround the pitch to pinpoint the action wherever it is happening, and audio mixers chase the ball around the pitch to crossfade between the mics. The MixAir AI automates this and allows mixers to concentrate on their craft. Oldfield says this enables mixers to, “craft a mix rather than chase a mix”.

It also creates opportunities to generate content for other events, or for second tier sports. AI takes advantage of existing infrastructures and automates aspects of the production because all it needs is the raw mic feeds; this means that niche events can be covered with enhanced or even immersive mixes at fraction of the cost. This enables clubs to provide wider coverage, like reserve and youth team matches, and also provides the same opportunities for lower level clubs and for more niche sports.

Salsa Sound started at Salford University in 2017 when Oldfield and Dr Ben Shirley worked on a project to develop tools to personalise broadcast audio by capturing individual audio. They quickly realised that is exactly what an audio mixer does; analyse individual sound objects and combine them together to create a specific mixed output.

“Sound object analysis is more akin to what mixers need; MixAir has always been built with an object-based audio paradigm, using AI to analyse all of the input audio sources, creating and mixing between the different types (pitch, crowd, commentary etc), all the while producing an XML metadata feed which can be used to create immersive mixes for formats such as stereo, binaural, 5.1, ATMOS etc,” states Oldfield.

He goes on: “Up until fairly recently we’ve been focusing on football, but our ecosystem is now already in place and we’ve been experimenting with new submix types, such as for ice hockey, boxing and American football.”

As clubs look for more ways to engage with their fans, and broadcasters look to provide more choice and personalisation across more channels, AI is already playing a more central role into how audio is produced.

Subscribe and Get SVG Europe Newsletters