Mixing clever: Salsa Sound embraces the power of AI in MIXaiR 2.0
Regular readers of SVG Europe will be well aware of how innovative technology companies working in sports broadcasting can be, but MIXaiR from Salsa Sound looks likely to set a new bar for live audio and the craft of sound mixing.
When the pandemic caused football games to be played in silent stadiums Salsa Sound came to the rescue of broadcasters with its vCrowd real time virtual crowd atmosphere system, which rightly won plaudits. However, MIXaiR is based on technology that the company has been working on for several years, through academia, R&D, beta versions and a soft launch of the system a year ago.
“Our company was founded in 2017, but prior to that we were part of the University of Salford, where we were working on a lot of innovative audio techniques,” says co-founder and CEO, Rob Oldfield. “In particular we were looking at how to leverage artificial intelligence (AI) to recognise sound events and then create the best possible mix, not based on tracking but just based on what is actually [captured by] those microphones.”
Fast forward to the future, next month in fact, and the new version of MIXaiR, an AI-driven system that automatically creates and enhances audio mixes for live broadcast, is set to be released.
“MIXaiR 2.0 is jam packed with new features and a much more intuitive, easy to use interface,” says Oldfield. “The idea for v2.0 is you put all your microphones feeds in and then MIXaiR creates different submixes. So you might have a crowd, a commentary, or a pitch mix or aux in, and MIXaiR will automatically balance the levels between them, apply some processing, then create the best possible mix out of it without any human interaction, other than setting it up in the first place.”
According to Oldfield, the hardest mix, “by a country mile,” is the pitch mix. “Historically, it’s such a dynamic process by engineers and difficult to replicate,” he explains. “To make sure you’ve always got the nearest microphone active in the mix at any one point, requires constant attention and raising and lowering of faders. It’s a really clever balancing act.”
Rather like the players whose kicks it tracks, MIXaiR’s AI has gone through an extensive training regime. Using machine learning technology, Salsa Sound has been training MIXaiR with many hours of content, microphone recordings and mixes across leagues.
“People don’t realise how hard it is to mix really well. That’s why we’ve gone through pains to get an AI solution that takes the strain from some of the really difficult grunt work of mixing”
“We know what makes a great mix,” says Oldfield. “The AI is constantly analysing all of the sounds it is hearing. We tell it what to make of these sound events and it learns what’s a kick, what’s a whistle, what’s a ball bounce, so that when it sees or hears sound in the wild, it can make an intelligent decision based on it.”
Oldfield says MIXaiR delivers an even pitch mix, without any slightly awkward transitions and unbalanced crowd noise.
“Our approach is to have the AI listen to those live microphone feeds, and when it hears a sound that’s interesting – like a kick, whistle, ball bounce or hitting the crossbar, you name it – it will automatically add that microphone feed into the mix. It tracks the game around, always choosing the microphone that’s closest to the action, and [performs] a seamless transition between them.”
“People don’t realise how hard it is to mix really well,” he adds. “That’s why we’ve gone through pains to get an AI solution that takes the strain from some of the really difficult grunt work of mixing. It can enable these guys to explore creative avenues and to create more innovative content. When you’re not completely locked into your screen trying to create the best mix, you can actually lean in to MIXaiR a little bit more, let that create the stems and then you can have a bit of cognitive space to craft a mix, rather than chasing a mix.”
The commentary and crowd noise are dynamically processed to provide mix components. “These sub mixes are basically ingredients that you can throw into the output mixes,” explains Oldfield. “Then it’s just a drag and drop process. You can create whatever output mixes you want, as many as you want, in different formats. You can have international sound, a French mix, a German mix, a Spanish mix, or whatever, and within that you can have different flavours: the stereo mix, the 5.1 mix, the mono mix. The only limitation is the number of channels that you’ve got available on your output board. You can also apply VST plug-ins to add your own compression or EQ or effects to the submixes, so it becomes a creative tool, allowing you to craft your mixes.”
“Importantly, [all mixes] are loudness normalised,” he adds. “If you’re streaming on YouTube, or your own video on demand platform, or when it is for broadcast, they all have different loudness requirements. So within MIXaiR, you just decide which requirement you want, click on it and it will ensure that the mix adheres to those standards, so you don’t end up with a fine from the regulator for making too much noise, or making not enough noise.”
MIXaiR users also have an option to enhance the pitch mix with pre-recorded sounds. “When somebody whacks the ball in the middle of the park, the [sound supervisor] can’t even hear it because it’s a long way away from the microphones,” says Oldfield. “However the AI can pick up a kick in the middle of the pitch. The algorithm can detect a kick, go to a bank of pre-recorded kicks and pick out the most appropriate one, and then it scales the level based on the audio [analysis].
It just adds a little bit more punch to those sounds that you wouldn’t normally hear, so that you get more of an even and realistic sense of the sounds on the pitch.”
Home or away
A big appeal for broadcasters of the system could be its scalability. Able to be deployed as hardware or on a server, it’s ready for cloud-based remote production. “As long as you can get the microphone feeds up to the cloud, then the mix engineer can be anywhere,” says Oldfield. “We wanted to create a tool that fits perfectly in with the kind of remote and distributed production and cloud-based workflows we’re seeing, and I’m confident we’ve achieved that with MIXaiR.”
“We want the lower leagues to [sound] as good as the top tier content. One of our motivations for MIXaiR is to facilitate that”
“The ability to have multiple mixes going on all at once means that it is doing the job of multiple people,” he continues. “You can have your feeds coming in from the different matches, and you can access it through a browser. You just spin up a couple of different instances of MIXaiR, and you can have one person who’s just flicking between different matches, putting up different mixes and you can actually QC multiple games all at once. Equally, you could deploy it at the venue, create the auto mix there and then put the stems up onto the cloud.”
An AI-powered mix has an obvious appeal for new and emerging broadcasters, who perhaps don’t have the audio heritage and talent of the major sports platforms. Salsa is also targeting leagues with fewer resources to deploy.
“We want the lower leagues to [sound] as good as the top tier content,” says Oldfield. “One of our motivations for MIXaiR is to facilitate that, as well as niche sports or emerging genres. We’ve designed the software so it’s really easy to extend it. When we migrate to a different sport, it’s just about retraining the neural network.”
Salsa has also added a record feature that can begin and end at any point within the mix.
“You can go back and [edit something] out from the crowd, or if you want to create archive content or actually have a bit of a different slant on the sound and the visuals, if you’ve kept the audio assets, then you can create a cinematic version of it,” says Oldfield. “What I think every broadcaster wants to do, and correctly, is put fans close to the action.”
“One of the things that we’ve been working on with Manchester City is looking at creating a fan experience mix,” he adds. “We’re putting additional microphones and processing right in the heart of where the fans are, so there may be a little bit of fruity language at times, but for replays, you get this sense of what was it like to be in the stands, rather than what it’s like to be a TV viewer. You’re getting the best seat possible. Being able to hear all of those on-pitch sounds, you’re really up close and personal with the action, but you’re in with the fans and really feel like you’re part of the game.”
With the multiple mix facility broadcasters and clubs could also have different flavours of crowd mixes. “Imagine being able to create a home mix, an away mix, or a broadcast mix and a pub mix,” Oldfield says. “Once you’ve got the microphones there, then you can create the mix for it.”
It seems that the sound pros that Salsa has spoken to recognise this opportunity for what it is.
“We’ve had really good feedback from the sound supervisors in the main,” he says. “There’s always going to be people who basically don’t like AI and feel threatened by it, but we’re not in the business of stopping people from doing their jobs.
“The key is [the sound supervisors] are not chasing events; they’re crafting events. Having a little bit of time to breathe rather than constantly being transfixed on just creating that pitch mix is no bad thing. In an OB truck sometimes it hits the fan, but if you’re rushing around trying to fix a comms problem, or some equipment that’s gone down, you can’t then create your pitch mix because you’re running around making sure the broadcast stays on air. I think it’s great to have a tool that you can [depend on to] create the best possible pitch mix for you.”