Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google AI Introduces ZeroBAS: A Neural Method to Synthesize Binaural Audio from Monaural Audio Recordings and Positional Information without Training on Any Binaural Data

    Google AI Introduces ZeroBAS: A Neural Method to Synthesize Binaural Audio from Monaural Audio Recordings and Positional Information without Training on Any Binaural Data

    January 18, 2025

    Humans possess an extraordinary ability to localize sound sources and interpret their environment using auditory cues, a phenomenon termed spatial hearing. This capability enables tasks such as identifying speakers in noisy settings or navigating complex environments. Emulating such auditory spatial perception is crucial for enhancing the immersive experience in technologies like augmented reality (AR) and virtual reality (VR). However, the transition from monaural (single-channel) to binaural (two-channel) audio synthesis—which captures spatial auditory effects—faces significant challenges, particularly due to the limited availability of multi-channel and positional audio data.

    Traditional mono-to-binaural synthesis approaches often rely on digital signal processing (DSP) frameworks. These methods model auditory effects using components such as the head-related transfer function (HRTF), room impulse response (RIR), and ambient noise, typically treated as linear time-invariant (LTI) systems. Although DSP-based techniques are well-established and can generate realistic audio experiences, they fail to account for the nonlinear acoustic wave effects inherent in real-world sound propagation.

    Supervised learning models have emerged as an alternative to DSP, leveraging neural networks to synthesize binaural audio. However, such models face two major limitations: First, the scarcity of position-annotated binaural datasets and second, susceptibility to overfitting to specific acoustic environments, speaker characteristics, and training datasets. The need for specialized equipment for data collection further constraints these approaches, making supervised methods costly and less practical.

    To address these challenges, researchers from Google have proposed ZeroBAS, a zero-shot neural method for mono-to-binaural speech synthesis that does not rely on binaural training data. This innovative approach employs parameter-free geometric time warping (GTW) and amplitude scaling (AS) techniques based on source position. These initial binaural signals are further refined using a pretrained denoising vocoder, yielding perceptually realistic binaural audio. Remarkably, ZeroBAS generalizes effectively across diverse room conditions, as demonstrated using the newly introduced TUT Mono-to-Binaural dataset, and achieves performance comparable to, or even better than, state-of-the-art supervised methods on out-of-distribution data.

    The ZeroBAS framework comprises a three-stage architecture as follows:

    1. In stage 1, Geometric time warping (GTW) transforms the monaural input into two channels (left and right) by simulating interaural time differences (ITD) based on the relative positions of the sound source and listener’s ears. GTW computes the time delays for the left and right ear channels. The warped signals are then interpolated linearly to generate initial binaural channels.
    2. In stage 2, Amplitude scaling (AS) enhances the spatial realism of the warped signals by simulating the interaural level difference (ILD) based on the inverse-square law. As human perception of sound spatiality relies on both ITD and ILD, with the latter dominant for high-frequency sounds. Using the Euclidean distances of source from both ears and , the amplitudes are scaled.
    3. In stage 3, involves an iterative refinement of the warped and scaled signals using a pretrained denoising vocoder, WaveFit. This vocoder leverages log-mel spectrogram features and denoising diffusion probabilistic models (DDPMs) to generate clean binaural waveforms. By iteratively applying the vocoder, the system mitigates acoustic artifacts and ensures high-quality binaural audio output.

    Coming to evaluations, ZeroBAS was evaluated on two datasets (results in Table 1 and 2): the Binaural Speech dataset and the newly introduced TUT Mono-to-Binaural dataset. The latter was designed to test the generalization capabilities of mono-to-binaural synthesis methods in diverse acoustic environments. In objective evaluations, ZeroBAS demonstrated significant improvements over DSP baselines and approached the performance of supervised methods despite not being trained on binaural data. Notably, ZeroBAS achieved superior results on the out-of-distribution TUT dataset, highlighting its robustness across varied conditions.

    Subjective evaluations further confirmed the efficacy of ZeroBAS. Mean Opinion Score (MOS) assessments showed that human listeners rated ZeroBAS’s outputs as slightly more natural than those of supervised methods. In MUSHRA evaluations, ZeroBAS achieved comparable spatial quality to supervised models, with listeners unable to discern statistically significant differences.

    Even though this method is quite remarkable, it does have some limitations. ZeroBAS struggles to directly process phase information because the vocoder lacks positional conditioning, and it relies on general models instead of environment-specific ones. Despite these constraints, its ability to generalize effectively highlights the potential of zero-shot learning in binaural audio synthesis.

    In conclusion, ZeroBAS offers a fascinating, room-agnostic approach to binaural speech synthesis that achieves perceptual quality comparable to supervised methods without requiring binaural training data. Its robust performance across diverse acoustic environments makes it a promising candidate for real-world applications in AR, VR, and immersive audio systems.


    Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

    The post Google AI Introduces ZeroBAS: A Neural Method to Synthesize Binaural Audio from Monaural Audio Recordings and Positional Information without Training on Any Binaural Data appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticletCalc – simple calculator
    Next Article Microsoft Presents a Comprehensive Framework for Securing Generative AI Systems Using Lessons from Red Teaming 100 Generative AI Products

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 31, 2025
    Machine Learning

    Cisco’s Latest AI Agents Report Details the Transformative Impact of Agentic AI on Customer Experience

    May 31, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Meet VideoRAG: A Retrieval-Augmented Generation (RAG) Framework Leveraging Video Content for Enhanced Query Responses

    Machine Learning

    What do we know about the economics of AI?

    Artificial Intelligence

    Must-Have WordPress Plugins for 2024

    Development

    Innovative Machine Learning-Driven Discovery of Broadly Neutralizing Antibodies Against HIV-1 Using the RAIN Computational Pipeline

    Development

    Highlights

    The Role of React Native in Driving Digital Transformation for Modern Businesses🚀

    April 22, 2025

    Post Content Source: Read More 

    How to pass dynamic pass parameter values in jmeter using beanshell pre-processor?

    May 1, 2024

    Meet Andesite AI: An Advanced AI Security Analytics Startup that Empowers both Private- and Public-Sector Cyber Experts

    April 15, 2024

    Watermarking AI-generated text and video with SynthID

    May 14, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.