Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google DeepMind Introduces Video-to-Audio V2A Technology: Synchronizing Audiovisual Generation

    Google DeepMind Introduces Video-to-Audio V2A Technology: Synchronizing Audiovisual Generation

    June 23, 2024

    Sound is indispensable for enriching human experiences, enhancing communication, and adding emotional depth to media. While AI has made significant progress in various domains, incorporating sound in video-generating models with the same sophistication and nuance as human-created content remains challenging. Producing scores for these silent videos is a significant next step in making generated films.

    Google DeepMind introduces video-to-audio (V2A) technology that enables synchronized audiovisual creation. Using a combination of video pixels and text instructions in natural language, V2A creates immersive audio for the on-screen action. The team tried autoregressive and diffusion methods to find the best scalable AI architecture; the results for generating audio using the diffusion method were the most convincing and realistic regarding the synchronization of audio and visuals.

    The first step of their video-to-audio technology is compressing the input video. The audio is repeatedly cleaned up from background noise using the diffusion model. Visual input and natural language prompts are used to steer this process, which generates realistic, synced audio that closely follows the instructions. Decoding, waveform generation, and merging the audio and visual data constitute the final step in the audio output process.

    Before iteratively running the video and audio prompt input through the diffusion model, V2A encodes them. The next step is to create compressed audio decoded into a waveform. The researchers supplemented the training process with additional information, such as transcripts of spoken dialogue and AI-generated annotations with extensive descriptions of sound, to improve the model’s ability to produce high-quality audio and to train it to make specific sounds.

    The presented technology learns to respond to the information in the transcripts or annotations by associating distinct audio occurrences with different visual sceneries by training on video, audio, and the added annotations. To make shots with a dramatic score, realistic sound effects, or dialogue that complements the characters and tone of a video, V2A technology can be paired with video generation models like Veo.

    With its ability to create scores for a wide range of classic videos, such as silent films and archival footage, V2A technology opens up a world of creative possibilities. The most exciting aspect is that it can generate as many soundtracks as users desire for any video input. Users can define a “positive prompt” to guide the output towards desired sounds or a “negative prompt” to steer it away from unwanted noises. This flexibility gives users unprecedented control over V2A’s audio output, fostering a spirit of experimentation and enabling them to quickly find the perfect match for their creative vision.

    The team is dedicated to ongoing research and development to address a range of issues. They are aware that the quality of the audio output is dependent on the video input, and distortions or artifacts in the video that are outside the training distribution of the model can lead to noticeable audio degradation. They are working on improving lip-syncing for videos with voiceovers. By analyzing the input transcripts, V2A aims to create speech that is perfectly synchronized with the mouth movements of the characters. The team is also aware of the incongruity that can occur when the video model doesn’t correspond to the transcript, leading to eerie lip-syncing. They are actively working to resolve these issues, demonstrating their commitment to maintaining high standards and continuously improving the technology.

    The team is actively seeking input from prominent creators and filmmakers, recognizing their invaluable insights and contributions to the development of V2A technology. This collaborative approach ensures that V2A technology can positively influence the creative community, meeting their needs and enhancing their work. To further protect AI-generated content from any abuse, they have integrated the SynthID toolbox into the V2A study and watermarked it all, demonstrating their commitment to the ethical use of the technology.

    The post Google DeepMind Introduces Video-to-Audio V2A Technology: Synchronizing Audiovisual Generation appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDistroWatch Weekly, Issue 1076
    Next Article Toucan TTS: An MIT Licensed Text-to-Speech Advanced Toolbox with Speech Synthesis in More Than 7000 Languages

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Goodbye, DirectAccess! Microsoft deprecates it, but the company is coming up with something better

    Development

    New Cybersecurity Concerns Emerge as Kamala Harris Presidential Campaign Targeted by Foreign Hackers

    Development

    Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

    Development

    Hackers Use Corrupted ZIPs and Office Docs to Evade Antivirus and Email Defenses

    Development

    Highlights

    Development

    Prompt Engineering for Web Development

    March 16, 2025

    Learn effective prompt engineering techniques for AI code generation in WordPress. Discover best practices, examples,…

    President Sally Kornbluth and OpenAI CEO Sam Altman discuss the future of AI

    May 6, 2024

    CVE-2025-3638 – Moodle CSRF in Brickfield Tool

    April 25, 2025

    CVE-2025-1333 – IBM MQ Container Keycloak Information Disclosure

    May 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.