Google's DeepMind Brings V2A To Generate Soundtracks and Dialogues For Videos: Here's How

LolitaChiquita

Editor

posted on 1 year ago — updated on 1 second ago

272
views

Google's DeepMind division is working on new set of AI tools that can work in tandem with Veo and other platforms from the company.

Google DeepMind is working on a new AI model that can generate soundtracks and dialogue for videos. In a recent blog post, the tech giant’s AI research lab unveiled V2A (Video-to-Audio), a new work-in-progress AI model that “combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.”

Generating videos from text prompts is making a splash in the creative world. The majority of the available tools, however, have a major drawback and are limited to silent videos.

Google DeepMind’s V2A is designed to work seamlessly with Veo, Google’s text-to-video model that was introduced at the I/O 2024 last month.

This combination allows users to improve their videos not just visually but also audibly. V2A can also breathe life into “traditional footage” like silent films and archival material, reported the Indian Express.

This AI model technology aims to change the way users create and experience AI-generated videos. It can be used to add realistic sound effects, dramatic music, and dialogue that matches the tone of the video.

The V2A model can generate an unlimited number of soundtracks for any video. Users also get the chance to tweak the audio output with ‘positive prompt’ and ‘negative prompt’, which can be used to get the sound right and tune the output to your preferences. In addition to that, every piece of the generated audio is watermarked with SynthID technology to make sure it is original and authentic.

DeepMind’s V2A represents a significant leap forward in AI-powered video creation. The new technology takes the description of a sound as input and uses a diffusion model trained on a mix of dialogue transcripts, sounds, and videos to fill a crucial gap, making videos more immersive and engaging. While the model is powerful, V2A has not been trained on a large number of videos yet and the output might come out distorted at times. Hence, to prevent any potential misuse, Google said that it won’t release V2A to the public anytime soon.