Sports TV Awards Winner: Outstanding audio with Salsa Sound’s vCROWD bringing atmosphere to fans

Salsa Sound’s Rob Oldfield receiving his Sports TV Award

When the SVG Europe Sports TV Award for Outstanding Audio Innovation was won by the vCROWD Virtual Crowd Sound Solution by Salsa Sound and Manchester City FC, it was in recognition of a vast effort to bring emotion back to the beautiful game during its pandemic-restricted audience-free matches.

Salsa, renowned for its artificial intelligence (AI)-driven audio mixing, produced an innovative app-based solution for the control of virtual crowds for live broadcast games.

“We normally do automated audio mixing, and this was obviously a pivot for us,” says Rob Oldfield co-founder and CEO, Salsa Sound. “We do the narrative side of things, we get the pitch sounds, but then this was about augmenting or creating a crowd, which is more about the emotions. It was a different field.”

Behind closed doors

Salsa was exploring sound mixing options with Manchester City ahead of the Champions League semi-final in 2019, but the pandemic had other plans. “That was the first game to be canned because of COVID,” says Oldfield. “It put us in a bit of a difficult situation as a company because it obviously stalled some of our development plans, but more importantly it stunted the whole broadcasting industry.”

A subsequent conversation with the football club about the chance of games being played behind closed doors led to a breakthrough. “They said that when games do start, they wanted to be able to add a bit of flavour and a bit of excitement back to their content. They didn’t want fans to feel like it’s boring, with no sense of vibe around the game. This was before anybody had really started talking about virtual crowds, in public at least,” says Oldfield. “So we thought we could do something to bring that crowd tone back.”

Salsa had access to lots of recordings (stems) of the actual crowd at the club’s Etihad Stadium so set out first of all to create a sample bank. More importantly however was the player, which it called vCROWD.

vCROWD is designed to be easy to use and with a low barrier to entry

“We recognised quickly that if you want to be able to control a virtual crowd with the ebb and flow of a fast-moving game like football, it’s got to be really easy to change the flavour of the crowd based on how you’re responding to the events. You don’t have to think about what you’re doing, you just want to quickly output a plausible realistic sound, and in real time; nobody wants to hear the goal reaction seconds afterwards because it took ages to find the right button or whatever.”

Salsa wanted to put a tool into people’s hands that was easy to use, didn’t require them to already have masses of content [“We could get the content for them,” says Oldfield], and was a really low barrier to entry.

“You should not need to have loads of audio experience because fundamentally, virtual crowd is actually more of an editorial role rather than an audio production role,” says Oldfield. “It’s about knowing how a crowd responds to those events and what kind of emotion should be evoked by the events on the pitch.”

We’re Not Really Here

Oldfield sees sports sound as storytelling and uses a cinematic analogy to explain: “You’ve got the sound events on the field of play, that’s like dialogue. Then you’ve got the crowd which is kind of like the backing track. When you pull the music from a film, you lose the emotion. When you take the crowd out, a lot of the emotion goes. Putting that emotion back in was fundamental to what we wanted to do.

“Also, knowing that it’s a creative, artistic role we wanted it to be easy for people who operate in an editorial capacity to be able to use vCROWD. We turned to the music production world to look at what interfaces are being used and came up with the idea of the parameters as a touchscreen interface.”

“When you take the crowd out, a lot of the emotion goes. Putting that emotion back in was fundamental to what we wanted to do”

The vCROWD player runs on a touchscreen tablet PC with a large square on the right-hand side of the interface; anywhere the operator pushes within that square will produce a different sound.

“The idea is that with one gesture, you can control the flavour of the crowd,” Oldfield explains. “If you raise your finger on the trackpad up, the crowd get more excited. They don’t just get louder – we do clever processing so that the crowd has more energy. So there are not only more people screaming or shouting, but actually the energy of their shouting increases. The pitch slightly changes, because as people shout louder they get more high-pitched in tone. We called on some of our audio expertise to be able to create a plausible, realistic crowd getting more excited.”

The system contains multiple different samples that can be triggered and blended to create an exciting crowd sound. It also can control the level of applause – from a smattering of claps through to rapturous ‘everybody on their feet’ ovations.

“It’s very easy to change the flavour without even looking. You move your hand around, and watch the game,” says Oldfield. “That all happens on the right-hand side of the screen. Then on the left-hand side are one-shot buttons like goal, near miss, crowd disdain and ironic cheer.”

Through its work with Manchester City, Salsa had access to all the classic chants from the crowd.

“We were lucky enough to have full immersive recordings from within the crowd,” Oldfield explains. “We had all the songs: We are City, Blue Moon Rising, Hey Jude, We’re Not Really Here, all of those chants and other signature sounds of the Etihad, and we were able to trigger those. Again, someone in an editorial role knows when those chants would happen.”

The system ended up being hugely customisable. “It has a setup file in the backend and you can decide what buttons you want to put in, you can even change the colour of them so it’s a bit easier to quickly see when you need a cheer,” says Oldfield. “We also put a button in called Momentum, for when a player is on the ball, driving down the pitch, the crowds are on their feet, you know something’s about to happen and everyone gets pretty excited. We added an AUX input, so if you had an external source such as the Zoom feed from a chat room with fans, you can add that in at will and change the overall level with a slider. It’s quite sophisticated in how it manages the different audio streams. We ended up with a really flexible tool.”

International players

Since the success with Manchester City, Salsa has taken vCROWD abroad. “A lot of broadcasters out in the US had shown an interest,” says Oldfield. “So we did the Women’s Super League Challenge Cup with CBS Sports, which was actually the first virtual crowd sound broadcast in America. They got over a million viewers.”

“Then it got immediately picked up by several Major League Soccer (MLS) clubs working with Vista Wordlink (part of NEP Group), and we put together bespoke crowd banks specifically for each team. As we did with Manchester City, we had to bring in all the different chants, lots of drumming and get the flavours of the crowd,” says Oldfield. “One team, Phoenix Rising FC, have two different bands that play live throughout, and so that’s really part of the signature sound. We added in different controllers just to be able to manage that flavour, allowing [the operator] to control which band to put in. That was a fun one.”

Salsa then worked with CBS again, for all of its college football and college basketball. “In their case that was a sample bank providing [options for] a large stadium and a small stadium, and recognising the difference between them.

“American football crowds work very differently compared with UK football [soccer], so we introduced a few differences in how the one-shot buttons worked, but the trackpad worked brilliantly for American football, such as adding a surge of excitement when you’ve got to do a touchdown,” he adds.

American dozen

Following this success, the Big Ten League, one of the major college leagues for American football, got in touch.

“We put a different sample bank together for each of the teams in their league [Salsa worked with 12 teams in all]. We worked really hard on that.

“We discovered that each team’s fans respond differently to different events on the pitch. The crowds actually display a sort of collective personality. Every team’s got their own marching band and they all play different songs. They’ve all got their own signature song – the fight song – when there’s a touchdown, and this other tune they play at the end of the game. Then they play little interludes after every single play. So we had to go through a load of legacy content that they provided for us. It was a huge challenge for our team to be able to chop out all of the segments, edit and clean them up because some of them were taken from an international feed rather than from mic-friendly feeds. We created this sample bank that again was super-easy for them to control and gave them the buttons that they wanted. There was a lot of customisation in terms of what songs that band would play if they had been there and also how the crowd responds to certain events.

“It’s interesting that when you use the trackpad to change the excitement, it sounds totally different in, for example, Penn State University, which has got a big stadium as opposed to Rutgers University, which has got a comparatively small turnout.”

Salsa sold the Big Ten a set number of licences and then uploaded sample banks to the server. “They would basically just pick up the Surface PC and then decide which sample bank to use. If you’re in Wisconsin, and then the next week you’re at Maryland, it’s the same device, you just click Maryland and then off you go. It’s really easy to use; one device can create a different sound depending on which stadium you’re at.”

And after soccer and American football, the team at Salsa turned their hands (and ears) to men’s and women’s basketball, supporting the Big Ten college leagues for CBS.

“It wasn’t as granular for CBS, but it was the same approach, every now and again we’d upgrade the sample bank, and if we found a few more cool loops, we’d send them a new version. It just keeps things interesting,” says Oldfield. “Fundamentally, it’s our name on it, so we want to make sure that it sounds as good as possible, and that our customers are happy with it.”

“Fundamentally, it’s our name on it, so we want to make sure that it sounds as good as possible, and that our customers are happy with it”

“Some of us worked a lot of late hours, especially in the US managing support calls, because we shipped the hardware to these guys and a support contract was included in our price. So the support side was significant,” he continues. Since then we have extended our team just recognising that there’s the need to continue to push the software and build new tools.”

Sounding out 2022

Salsa has also been concentrating a lot of development on MIXaiR, which it recently released. “The way that we’ve authored MIXaiR is that it runs with different modules that you can easily slot in, one of which could be vCROWD,” says Oldfield. “We’ve already started working with vCROWD, in a future-facing context, such as looking at how can you improve the sound of replays on the TV.

“One of the things we’re working on at the moment is using AI to gauge how excited the crowd is at different moments and then that can be used to drive vCROWD, to be able to create an authentic virtual crowd without the need for an operator. You wouldn’t necessarily want that in the live feed because you’ve got the real microphones but when it comes to doing a replay, it could be a nice controllable way of riding the excitement as you show the replay, as the goal goes in.”

Oldfield says a big part of what assistive technologies like AI are for is to make life easier. It’s the ethos behind both MIXaiR and vCROWD.

“I’m really chuffed with the SVGE award,” he adds. “We’ve made it easy and set a low barrier to entry so that anybody could use vCROWD. We have small clubs in the US using it just with their local fans. I think that’s brilliant, you don’t have to be an international broadcaster to produce great content anymore. We can provide tools that help this kind of tier two or tier three sports still sound excellent. That’s something that we’ll keep moving towards across all the technology that we produce.”

Subscribe and Get SVG Europe Newsletters