Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How AI further empowers value stream management

      June 27, 2025

      12 Top ReactJS Development Companies in 2025

      June 27, 2025

      Not sure where to go with AI? Here’s your roadmap.

      June 27, 2025

      This week in AI dev tools: A2A donated to Linux Foundation, OpenAI adds Deep Research to API, and more (June 27, 2025)

      June 27, 2025

      The next big HDMI leap has arrived – here’s how these 16K cables will shake things up

      June 27, 2025

      Here’s how you can still trade in any phone at Verizon to get an iPhone, iPad, and Apple Watch free

      June 27, 2025

      Anthropic has a plan to combat AI-triggered job losses predicted by its CEO

      June 27, 2025

      Forget Google and Microsoft: OpenAI may be building the ultimate work suite of apps and services

      June 27, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Say hello to ECMAScript 2025

      June 27, 2025
      Recent

      Say hello to ECMAScript 2025

      June 27, 2025

      Ecma International approves ECMAScript 2025: What’s new?

      June 27, 2025

      Building Together: PRFT Colleagues Volunteer with Atlanta Habitat for Humanity

      June 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Fix Elden Ring Nightreign Connection Errors And Server Login Failure PC

      June 27, 2025
      Recent

      Fix Elden Ring Nightreign Connection Errors And Server Login Failure PC

      June 27, 2025

      Fix Now EAC Error 20006 in Elden Ring: Nightreign [6 Easy Tricks]

      June 27, 2025

      Fix Now Elden Ring Nightreign EAC Error 30005 (CreateFile Failed)

      June 27, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets

    NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets

    May 12, 2025

    Audio diffusion models have achieved high-quality speech, music, and Foley sound synthesis, yet they predominantly excel at sample generation rather than parameter optimization. Tasks like physically informed impact sound generation or prompt-driven source separation require models that can adjust explicit, interpretable parameters under structural constraints. Score Distillation Sampling (SDS)—which has powered text-to-3D and image editing by backpropagating through pretrained diffusion priors—has not yet been applied to audio. Adapting SDS to audio diffusion allows optimizing parametric audio representations without assembling large task-specific datasets, bridging modern generative models with parameterized synthesis workflows.

    Classic audio techniques—such as frequency modulation (FM) synthesis, which uses operator-modulated oscillators to craft rich timbres, and physically grounded impact-sound simulators—provide compact, interpretable parameter spaces. Similarly, source separation has evolved from matrix factorization to neural and text-guided methods for isolating components like vocals or instruments. By integrating SDS updates with pretrained audio diffusion models, one can leverage learned generative priors to guide the optimization of FM parameters, impact-sound simulators, or separation masks directly from high-level prompts, uniting signal-processing interpretability with the flexibility of modern diffusion-based generation. 

    Researchers from NVIDIA and MIT introduce Audio-SDS, an extension of SDS for text-conditioned audio diffusion models. Audio-SDS leverages a single pretrained model to perform various audio tasks without requiring specialized datasets. Distilling generative priors into parametric audio representations facilitates tasks like impact sound simulation, FM synthesis parameter calibration, and source separation. The framework combines data-driven priors with explicit parameter control, producing perceptually convincing results. Key improvements include a stable decoder-based SDS, multistep denoising, and a multiscale spectrogram approach for better high-frequency detail and realism. 

    The study discusses applying SDS to audio diffusion models. Inspired by DreamFusion, SDS generates stereo audio through a rendering function, improving performance by bypassing encoder gradients and focusing instead on the decoded audio. The methodology is enhanced by three modifications: avoiding encoder instability, emphasizing spectrogram features to highlight high-frequency details, and using multi-step denoising for better stability. Applications of Audio-SDS include FM synthesizers, impact sound synthesis, and source separation. These tasks show how SDS adapts to different audio domains without retraining, ensuring that synthesized audio aligns with textual prompts while maintaining high fidelity. 

    The performance of the Audio-SDS framework is demonstrated across three tasks: FM synthesis, impact synthesis, and source separation. The experiments are designed to test the framework’s effectiveness using both subjective (listening tests) and objective metrics such as the CLAP score, distance to ground truth, and Signal-to-Distortion Ratio (SDR). Pretrained models, such as the Stable Audio Open checkpoint, are used for these tasks. The results show significant audio synthesis and separation improvements, with clear alignment to text prompts. 

    In conclusion, the study introduces Audio-SDS, a method that extends SDS to text-conditioned audio diffusion models. Using a single pretrained model, Audio-SDS enables a variety of tasks, such as simulating physically informed impact sounds, adjusting FM synthesis parameters, and performing source separation based on prompts. The approach unifies data-driven priors with user-defined representations, eliminating the need for large, domain-specific datasets. While there are challenges in model coverage, latent encoding artifacts, and optimization sensitivity, Audio-SDS demonstrates the potential of distillation-based methods for multimodal research, particularly in audio-related tasks. 


    Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEngineering Smarter Data Pipelines with Autonomous AI
    Next Article ZealousWeb LLC

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 27, 2025
    Machine Learning

    AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP

    June 27, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Free Email Signature Generator

    Web Development

    ‘No aggressive monetization’ — Nexus Mods’ new ownership responds to worried members

    News & Updates

    Google’s new free AI agent brings Gemini right to your command line – here’s how to try it

    News & Updates

    New UX/UI Tools I’m Loving! – Microsoft UX Certificate, Figma Updates, OpenAI Academy & Mor

    Web Development

    Highlights

    Microsoft may cut more jobs, and middle managers are in the crosshairs News & Updates

    Microsoft may cut more jobs, and middle managers are in the crosshairs

    April 10, 2025

    A new report suggests Microsoft could announce more job cuts soon, targeting middle managers to…

    Surface Pro 12-inch vs. Surface Pro 11: Which 2-in-1 Copilot+ PC is better for you?

    May 6, 2025

    CVE-2025-47727 – Delta Electronics CNCSoft File Path Traversal Remote Code Execution Vulnerability

    June 4, 2025

    CVE-2024-46899 – Hitachi Ops Center Analyzer Viewpoint OVF Authentication Credentials Leakage Vulnerability

    April 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.