Accelerating innovation: Salsa Sound and the advance of intelligent audio

Salsa Sound’s co-founders, Ben Shirley [foreground] and Rob Oldfield hard at work

Speaking to SVG Europe, the co-founders of Manchester-based audio start-up Salsa Sound discuss the work they have done during the COVID-19 pandemic, and how the virus (and subsequent lack of fans in stadiums) has quickened the pace of technological change within sports broadcasting. 

Salsa Sound is a company that was spun-out from the University of Salford in Greater Manchester, UK. Run by co-founders Rob Oldfield (who is also chief executive) and Ben Shirley, the business is focused on the acceleration of innovation in audio and this year it has had an opportunity to come to the fore.

The COVID-19 pandemic and subsequent lockdowns have meant sports fans have not been present in stadiums, which in turn has bought audio to the forefront of broadcasters minds when confronted with painfully empty – and quiet – stadia.

The forefront of audio innovation

However, Salsa Sound has used this silence as an opportunity to showcase what its innovative technology can do. “COVID has made people throw out the rule book because things have changed, and they have to change further still,” says Oldfield, speaking to SVG Europe.

“COVID has necessitated some acceleration in innovation in our industry, which is helping us because we pride ourselves in being on the forefront of audio innovation. It can be a poisoned chalice at times, but actually, at a time where people are having to innovate, it is kind of is helpful to us.”

He continues: “Audio is being pushed front and centre for the first time since I’ve been involved in this industry. All of a sudden the main thing people are talking about is, “what is all this going to sound like?” Obviously, there was a visual element to it with no fans [in stadiums], but the main thing people were worried about was sound, and that was the big difference.

Salsa Sound’s CEO and co-founder, Rob Oldfield

“For us, that was great because we pivoted early [in the lockdown] and did virtual crowd audio, but also because our automated mixing is designed to pull out and enhance the on-pitch sounds, we were in a great place,” adds Oldfield.

“You can imagine, when you’re in an OB truck where you can’t have so many people now, all of a sudden having something that can automate it and require fewer people is actually a really positive thing.”

Appetite for risk and change

One of the challenges that the co-founders of Salsa Sound have faced since the company was officially formed in 2017 has been getting broadcasters, in particular, to sign up to its new technologies.

Oldfield comments: “I think there’s a varied appetite for risk in the broadcast world. Some companies just want to stick with the status quo, and I think that’s one of the things where COVID is actually helping companies like us because the status quo doesn’t exist at the moment; you have to do things differently because the world’s looking different. So I think that’s actually helping us over one of our challenges, but we’ve certainly found it to be like wading through treacle at times, trying to get some of the major broadcasters, particularly in the UK, to adopt new technology and to be prepared to say, “these are processes that could be automated,” or ” this is a tool that does provide our viewers with a better experience”.”

On whether audio will be changed forever thanks to the events of 2020 or if it will instead return to how it was before, once fans are allowed en masse back into stadiums, Oldfield says: “I think one of the things that has made audio front and centre this year is that it [has been seen as a] problem. So I kind of hope that we solve the problems, in which case there’s less thought has to be given to it, but that audio’s importance remains the same. I hope that as viewers have got used to having all of these on-pitch sounds front and centre, that we can keep them there; viewers are used to hearing every grunt, every strike, every hit, every kick on the field, and that’s not something that we want to go back away from when crowds come back into the stands.

“We still want to hear all of that stuff because it’s part of the narrative of the game,” he goes on. “In terms of great storytelling, you need all of those on-pitch sounds. We want to keep it front and centre and our technology enables you to keep it there, even when you’ve got 80,000 fans in the stadium.”

In the beginning, there was sound

Salsa Sound was born out of an EU research project that began in 2010. Both Oldfield and Shirley had been working at the university in the acoustics department for many years, Oldfield as a research fellow and Shirley as a lecturer. They joined a project called Fascinate, which was looking into interactive broadcasting.

Reminisces Oldfield: “The idea at the time was that you’d have a 180-degree panoramic display, and we would allow users to zoom in and pan around. This is back in 2010, by the way, so it was a bit ahead of its time. We worked on that project with a number of other partners across the EU, including Fraunhofer, the BBC, Technicolor, and some other partners like Alcatel-Lucent.”

While the project was interesting, Oldfield and Shirley realised that if the visuals could be moved at will by viewers, the sound had to correlate or the effect would be lost. “If you give users the ability to interact with their video feed, you need to enable the audio to match up with that so it’s congruent,” explains Oldfield. “So if you zoom in, the audio zooms in; if you pan around then the audio pans. However, we quickly discovered that the way audio was done then didn’t really facilitate that; [the audio needed] to be done in an object-based manner. So we worked on a whole load of object-based audio technologies that would facilitate it.

“It was all about capturing and rendering it so that no matter how the viewer navigated their visual, the audio could match up with it perfectly, and the audio sources would come from the right locations and be at the right level and all that kind of stuff,” he notes.

Facing early challenges

On one of the test shoots for the project, carried out at Chelsea Football Club at Stamford Bridge, the pair were challenged. They realised that because of all the sounds on the football field, “nothing could be close-miked, you had no idea where they were because then we didn’t have tracking information available, and so we needed to write a load of algorithms that would use the microphones that are around the field and actually extract audio objects from those microphone signals,” Oldfield says.

The pair had to find out what the discreet audio sources were and separate them out, figure out where they were by using triangulation, and then create an object-based stream that would enable a renderer to position them appropriately.

A screenshot of Salsa Sound’s vCrowd, a virtual crowd audio solution for games held behind closed doors, developed in 2020

And so the start of the business was born, adds Oldfield: “What that meant was when we completed this and we were outputting the stream, we realised we’d actually automated a really great on-pitch mix because we’d isolated all of the sounds that we actually wanted and we knew which microphone should be active at any one point in time.”

As part of this next step in the project, they also looked at how the audio crew in OB trucks were doing mixing at that time, which, Oldfield, says, seemed, “like a really archaic way of mixing audio”.

He goes on: “We could use our AI tools to automate that and not only automate it, but produce a better result that’s more consistent, and we have the ability to enhance those on-pitch sounds as well.”

The Fascinate project ran from 2010 to 2012, then the pair worked on the technology they had developed for another two years to try and productise it and get it industry-ready. In 2014 the pair tried to license the technology, as they had a deal with a mixing console manufacturer, but, says Oldfield, “they pulled the plug on it actually at the last minute”.

However, he adds, “it turns out that was the best thing that happened to us, because it meant that rather than [the IP] remaining in the university and just being licensed out to one company, it gave us the incentive to go after it as a company. So fast forward a couple of years, we founded the company in 2017.”

Developing the product portfolio

It was only this time last year that the business began really accelerating when Oldfield left the university and became the full-time CEO of the business.

“Prior to that, it was a kind of evenings and weekends sort of job,” he notes. “Things have been a lot easier now that it’s my full-time occupation. And we’ve since developed our product portfolio to doing fan experience sounds, so creating bespoke fan-centric mixes which we call Front Row, so it’s like you’re in the front row of the stand of your favourite team. It’s more important than ever that you unite your fans across the world; there are many people who may never get to the stadium, but we can make it feel like they’re sitting there [in the stadium] through the sound that they’re getting, rather than the generic broadcast audio, which is pretty vanilla really in nature.”

While the university currently owns the technology IP so it can cover its costs of paying for the technology patents, Oldfield explained that the IP is soon to be licensed back to Salsa Sound as the university’s period of ownership is coming to an end. The university does own some equity in Salsa Sound, which it will keep.

Adds Oldfield: “The university is looking to monetise its research and so they pay for all the patents that we have on this, but they can’t do anything with those without a vehicle. So they either license the technology or [the patents are] exploited through a spin-out [such as Salsa Sound]. The idea is that [the university] grants an exclusive license and then it gets assigned over to us permanently once we met certain financial targets.”

While Oldfield is now full time and Shirley is still working at the university, the pair are in the process of recruiting a development team to take things further. The company is one member of the UK government’s Department for Digital, Culture, Media & Sport (DCMS) 5G Edge-XR consortium led by BT’s Media and Research teams, which is focused on developing virtual and augmented reality experiences for live sport.

As such, Salsa Sound needs to increase staff numbers to keep up with demand. “We need to start expanding a bit,” says Oldfield. “The market opportunity [for us now] is bigger, so we need to make sure that we grow proportionately so we can meet that need. So we’re forming a development team.”

Salsa Sound’s co-founders, Ben Shirley [background] and Rob Oldfield

Personalisation is the future

For the 5G Edge-XR project, Salsa Sound is looking to extend its AI use cases across multiple sports, as well as create bespoke, immersive mixes, responding to personalisation and interaction requirements, and migrating it’s technology into the cloud so that it can be deployed more readily. Additionally, the cloud provides great compute ability which aids one of the things that Salsa Sound does; real time metadata extraction.

Oldfield explains: “This is about figuring out where all the events are on the pitch and classifying them; things like the ball kick, bounce, referee’s whistle, and crossbar hit, and outputting it as a real-time data feed. This means you can say where all the kicks were on the field, so rather than waiting 10 minutes for Opta to upload their stats, you could actually have a real-time evolution of how the action moves around the field by where all the kicks were, and that sort of thing. One of the things that the cloud enables us to do is just to do that better and faster.”

He adds that audio personalisation is the future: “I think personalisation is what viewers have wanted, or listeners have wanted, for years; the ability to actually personalise that experience, whether it’s to make different components louder or quieter, or make it more immersive, or change the format of delivery, or remove certain aspects and clean up different bits.

“That’s here to stay as part of the future of audio. We talk about it as next generation, but I want it to be this generation audio, not next-generation audio. Obviously, a big part of [being able to create personalised audio] is being able to separate out the different audio elements, because that’s what enables you to personalise.”

Going forward, Oldfield says he simply wants to see Salsa Sound’s technology being used: “I want to see our technologies in multiple arenas. We’ve been recently doing a lot of stuff out in the States with American football and basketball and hockey, mainly with our virtual crowd stuff. That’s that seems to have gone down really well, particularly in the States. But my real goal is I just want to see stuff that we’ve been working on being used by real people. I want people to be sat at home just going, “this sounds awesome!”. And then, you know, I want to watch Match of the Day and go, “that’s us!”.

“That’s my personal goal, but I think the audio industry is moving a lot and at the moment with things like the migration to audio over IP and cloud-based solutions and a thirst for immersive technologies. I want us to be the key player in all of that; the go-to audio partner for exciting, intelligent audio,” Oldfield concludes.

To find out more about Salsa Sound’s work, watch episode one of Next Generation Audio Summit 2020.

Subscribe and Get SVG Europe Newsletters