Capturing the moment: The role of multimodal AI in archiving and retrieving the greatest sports memories
By Yvan Lataste, head of sport, Moments Lab.
Where were you in 1999 when Manchester United won the treble of the Premier League, FA Cup and UEFA Champions League? Do you recall the 2004 Athens Olympic gold won by Laure Manaudou in the 400 freestyle that marked the renaissance of French swimming, breaking a drought of more than 50 years? Some of the greatest sports moments that live in our collective memories don’t exist today by accident.
The viral image of Brazilian Olympic surfer Gabriel Medina suspended in mid-air over the water at Teahupo’o,Tahiti, during the recent Paris Games was taken by Agence France-Presse photographer Jérôme Brouillet, who, aboard a rocky boat, recognised the perfect surf conditions, knew Medina’s penchant for kicking out at the end of a ride, and managed to time the shot (one of just four taken!) perfectly.
The 10-part documentary The Last Dance about Michael Jordan’s final season at the Chicago Bulls is all thanks to producer Andy Thompson getting unprecedented access to shoot 500 hours of footage – about 3,200 reels of 16-millimeter film – and the NBA preserving it so beautifully for 20 years leading up to the creation of the miniseries.
Magnetic tapes and their playback devices are swiftly approaching end of life, and organisations are increasingly hitting go on archive digitisation projects to preserve their heritage and ensure iconic sports moments are not lost. Multimodal AI is an important technology to incorporate into this process because it enables sports clubs, leagues, federations and rights holders to uncover hidden moments in their vast archives and have a truly 360-degree view of their content — be it archive, fresh, or a livestream (or a live photo upload from a boat off the coast of Tahiti).
Why multimodal AI is changing the game
Now seen as the gold standard in AI and currently being researched by Big Tech, multimodal AI is a type of machine learning that is designed to mimic human perception. Rather than rely on a single data source like unimodal AI, multimodal AI ingests and processes multiple data sources including video, still images, speech, sound and text to achieve a more detailed and nuanced understanding of media content. This drastically enhances discoverability and, when paired with sports data feeds, enables sports content producers to search for precise match plays, goals and soundbites, accelerating content production workflows.
Performing an archive search with legacy MAM systems is a notoriously cumbersome and ineffective process. Typically, specific keywords linked to precise metadata tags must be entered to bring up relevant results. These results often come as complete files, not short clips, requiring users to scroll through sometimes hours of footage to find the moment they’re looking for.
Multimodal AI merges detection results from faces, text, objects, patterns (logos), transcriptions, landmarks and landscapes, and can produce shot descriptions and summaries of media files. This level of specificity enables content producers to unearth iconic plays, spotlight never-before-seen footage, and greatly accelerate the post-production process to keep sports fans engaged.
During their historic 2023/24 Bundesliga season, Bayer Leverkusen’s social media team managed to shave an entire day off their post-match production process using multimodal AI. Rolling through three-and-a-half hours of match day videos to clip key moments to share with their growing follower base used to take an entire work day. Multimodal AI coupled with semantic search means they now pinpoint the shots they need in their media library as quickly and easily as they search the web. This means fans get to relive the latest match highlights while the stadium roar is still echoing in their ears.
Cyclists whose faces are covered by helmets and sunglasses can be auto-detected and recognised by multimodal AI through their bib numbers and sponsor logos. Football goals and set plays like corners and free kicks can even be grouped into predetermined smart folders for auto-publishing to various channels like YouTube. If an on-field moment reminds an announcer of a play that happened in the past, multimodal AI indexing paired with semantic search makes it possible to quickly search the media library and pull up the relevant clip to show it side-by-side during the broadcast.
Enhancing the sports fan experience
Broadcasters are on a mission to boost audience engagement and reduce churn rates, and multimodal AI helps achieve this through the production of personalised VOD highlight packages that appeal to a wider range of viewer interests. Key people and moments tagged in a sports game can be quickly found in a media library and used to build highlights offerings. These offerings can be as diverse as overall match highlights, a collection of clips featuring a certain player, or even curated off-field moments to cater to viewers who prefer sideline drama and celebrity spotting.
Multi-language transcription with AI is also proving to be a gamechanger that brings sports fans together regardless of language. For example, in a press conference scenario, an athlete may be asked questions by reporters in different languages and then respond in another. Multimodal AI enables broadcasters to immediately translate and transcribe, then subtitle clips for various audiences in different geographical locations. This means fans can understand what is being said almost in the moment, so they stay engaged with the broadcast.
The future of multimodal AI in sports
The next breakthroughs in how sports moments are indexed, retrieved and experienced with multimodal AI will be in the realm of analytics. For example, the technology is being used to track the ball and even players during football and basketball games to glean insights into movements, strategies, and game dynamics.
Multimodal AI can also now detect and highlight crowd reactions such as applause, boos and singing, to provide even more context and nuance to sports media indexing and lay the groundwork for capabilities such as custom moments and new creative experiences that will enable content producers to build rough cuts even faster.
Amid the ever-growing demand for high-quality, personalised and engaging sports content, multimodal AI enables sports organisations and rights holders to quickly and efficiently unearth iconic and more nuanced moments in their media collections to take the sports fan experience to the next level.