Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Tango 2: The New Frontier in Text-to-Audio Synthesis and Its Superior Performance Metrics

    Tango 2: The New Frontier in Text-to-Audio Synthesis and Its Superior Performance Metrics

    April 18, 2024

    With the introduction of some brilliant generative Artificial intelligence models, such as ChatGPT, GEMINI, and BARD, the demand for AI-generated content is rising in a number of industries, especially multimedia. Effective text-to-audio, text-to-image, and text-to-video models that can produce high-quality material or prototypes fast are required to meet this need. It is imperative to enhance the realism of these models with respect to input prompts.

    In order to align Large Language Model (LLM) replies with human preferences, supervised fine-tuning-based direct preference optimisation (DPO) has recently become a viable and reliable substitute for Reinforcement Learning with Human Feedback (RLHF). This method has been modified for diffusion models in order to match outputs that have been denoised to human preferences.

    A team of researchers has employed the DPO-diffusion approach in a recent study to improve the semantic alignment of a text-to-audio model’s output audio with input prompts. They have used DPO-diffusion loss to optimize Tango, which is a publically available text-to-audio latent diffusion model, on a synthesized reference dataset. This dataset, called Audio-Alpaca, includes a variety of audio cues, along with their liked and unwanted sounds. 

    While the undesired audios have defects like missing concepts, incorrect temporal order, or excessive noise levels, the preferred audios faithfully capture their corresponding written descriptions. Techniques for producing unwanted sounds include causing disturbances to descriptions and using adversarial filtering to identify sounds with bad audio quality, or CLAP-score.

    Based on criteria determined by CLAP-score differentials, the team has chosen a subset of data for DPO fine-tuning in order to handle noisy preference pairs that arise from automatic synthesis. This guarantees a minimum separation between preference pairs and a minimum proximity to the input prompt. 

    The team has shared that based on experimental results, Tango can be fine-tuned on the trimmed Audio-alpaca dataset to produce Tango 2, which performs better in both human and objective evaluations than Tango and AudioLDM2. Tango 2 is better able to map input prompt semantics into the audio space when it is exposed to the contrast between good and bad audio outputs during DPO fine-tuning. Even though Tango 2 creates synthetic preference data using the same dataset as Tango, it makes notable improvements, demonstrating its effectiveness. 

    The team has summarized their primary contributions as follows.

    The study has presented a low-cost technique for producing a preference dataset semi-automatically for text-to-audio conversion. This method helps with model training by enabling the generation of a dataset where each prompt is linked to many unwanted and preferred audio outputs. 

    The preference dataset, known as Audio-Alpaca, has been made available to the research community. This dataset can be useful for benchmarking and more research in the future as text-to-audio generating methods are developed.

    Tango 2 outperformed both Tango and AudioLDM2 in terms of objective and subjective measures, even though it hasn’t sourced any more out-of-distribution text-audio pairs outside of Tango’s dataset. This demonstrates how well the suggested methodology works to improve model performance.

    Diffusion-DPO’s applicability has been shown by Tango 2’s performance, which highlights the technology’s potential for enhancing text-to-audio models and illustrates its usefulness in audio-generating tasks.

    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    Want to get in front of 1.5 Million AI Audience? Work with us here

    The post Tango 2: The New Frontier in Text-to-Audio Synthesis and Its Superior Performance Metrics appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGoogle AI Proposes TransformerFAM: A Novel Transformer Architecture that Leverages a Feedback Loop to Enable the Neural Network to Attend to Its Latent Representations
    Next Article Malicious Google Ads Pushing Fake IP Scanner Software with Hidden Backdoor

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft will force updates for Teams clients released over 90 days ago

    Operating Systems

    This is your yearly reminder that the Xbox naming scheme is bad, and Microsoft should feel bad

    Development

    Top 7 Business Benefits of ISO 20022 Adoption for Banks

    Development

    YTSubConverter – create styled YouTube subtitles

    Linux

    Highlights

    Celebrating the final AWS DeepRacer League championship and road ahead

    August 29, 2024

    The AWS DeepRacer League is the world’s first autonomous racing league, open to everyone and…

    I switched to $379 Android phone from my Pixel 9 Pro while traveling – and didn’t regret it

    April 2, 2025

    An LLM-Based Approach to Review Summarization on the App Store

    April 24, 2025

    Google joins OpenAI in adopting Anthropic’s protocol for connecting AI agents – why it matters

    April 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.