A descendent of the University of Salford’s involvement with a major EU-funded project, FascinatE, that was designed to give live event viewers a more interactive experience, SALSA (Spatially Automated Live Sports Audio) is a forthcoming hardware and software solution that has been developed to deliver “a dramatic improvement in the quality of sound for live sports broadcasts”. Expected to be commercially available early in 2017, SALSA will be on show throughout IBC next month.
Using any microphone configuration, SALSA automatically tracks and identifies on-pitch sounds, controls mixing console fader movements, and creates what is described as an “augmented sound experience”. The result, it is said, is that SALSA can automatically devise an immersive mix for both conventional linear and non-linear broadcast, and for more sophisticated object-based distribution.
SVG Europe recently sat down with University of Salford audio research consultant Dr. Rob Oldfield to find out more about the origins of SALSA and the features that are set to differentiate it from the increasing variety of immersive and object-based audio solutions…
When did work on what became SALSA start in earnest?
It was about four years ago off the back of an EU project called FascinatE. That had involved the deployment of a directional camera to zoom in and pan around a UHD video panorama, and for the audio the aim was to match what the viewers were seeing with their hearing. So they actually had a customised audio base on their viewpoint, and that meant we were able to customise the audio based on various production choices. As part of this process we worked on some object-based extractions, which meant we had to know what the sources were and where they were located.
[One of our challenges] was to localise and extract these sound sources, and during this we created an algorithm to listen to audio content and detect when [certain occurrences take place], for example headers, kicks, crossbars and so on. [As a further development of this] it became clear that the algorithm could be used to automate fader movements on the console to [match what was] taking place on the field of play.
And it is that idea which is at the heart of the new SALSA solution…
Yes. Having worked on audio and object extraction techniques [during FascinateE], we began to work on the application of those techniques for SALSA. With the IP that we had generated we wanted to look at a way of exploiting it, and it soon became clear that there were a lot of aspects where it could have a practical application, for example in live production or in the future when 4K has been rolled out and we have object-based broadcasting.
I would say that SALSA as we know it now really came together over the last 12 months or so. At IBC last year we had the first real time demo of automated mixing using SALSA, then at NAB this year we undertook an augmented mixing exercise. For IBC next month we will be showing our improved augmentation of the mix. We have discovered that the algorithm is slightly better than our ears at detecting ball-kicks – for example, sometimes someone kicks a ball that we cannot hear in the mic field, but the algorithm can hear it in the algorithm field. We have a timestamp where that ball-kick occurs and we can augment the kick with pre-recorded samples to help give a more enhanced feel for viewers so that they never miss a kick.
Interface modules can be developed for automation protocols such as CSCP, Ember, Ember+ and HUI, paving the way for a wide range of deployment options. But when do you think the commercialisation of SALSA will get underway?
We are going to launch as a product at the beginning of next year, and it will be available as software or hardware depending on preferences. We have a company who is working down south manufacturing the boxes for us. We are clearly at a very early stage, but we are doing our best to commercialise what we think is a great idea.
More generally, how do you perceive the overall outlook for object-based audio for broadcast?
I think that within five years we will be looking at an object-based world. There will be the opportunity to customise and individualise broadcasts, and it will be possible for [viewers] to access bits of audio content and raise the levels of the content or position it in own sound systems in the way they wish.
This trend fits in fantastically with what we can do. I like to think of object-based audio as broadcasting the recipe with all the ingredients, rather than broadcasting the complete meal as happens at present. It’s shaping up to be an exciting few years!