Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Camb AI Releases MARS5 TTS: A Novel Open Source Text to Speech Model for Insane Prosody

    Camb AI Releases MARS5 TTS: A Novel Open Source Text to Speech Model for Insane Prosody

    June 26, 2024

    MARS5 TTS, a game changer in open-source text-to-speech systems, has been released by the Camb AI team. This innovative model offers exceptional prosodic control and voice cloning capabilities, requiring less than 5 seconds of audio input. The system employs a two-stage architecture consisting of a 750M Auto-Regressive (AR) model and a 450M Non-Auto-Regressive (NAR) model. MARS5 utilizes a BPE tokenizer, enabling precise control over punctuation, pauses, and stops, thus advancing the field of speech synthesis.

    The model’s architecture follows a unique two-stage AR-NAR pipeline. In the initial stage, an autoregressive transformer model generates coarse (L0) encodec speech features from the input text and reference audio. Subsequently, these features, along with the text and reference, are refined using a multinomial Denoising Diffusion Probabilistic Model (DDPM) to produce the remaining encodec codebook values. Finally, a vocoder transforms the DDPM output into the final audio.

    The AR component of MARS5 predicts L0 coarse tokens, which are then further refined by the NAR DDPM model. This refined output is processed by the vocoder to generate the final audio. The model’s training on raw audio in conjunction with byte-pair-encoded text allows for nuanced control over prosody through punctuation and capitalization. For instance, adding commas introduces pauses, while capitalizing words emphasizes them, providing a natural method for guiding the generated output’s prosody.

    Compared to other leading language models like GPT and Gemini, MARS5 distinguishes itself through its specialized focus on text-to-speech synthesis and its unique AR-NAR architecture. While GPT and Gemini are primarily designed for text generation and understanding, MARS5 is optimized for producing high-quality, controllable speech output. Its use of DDPM in the NAR stage and the incorporation of prosodic control through text formatting sets it apart in speech synthesis.

    MARS5 demonstrates impressive results in voice cloning and prosodic control. The system supports two inference modes: a fast “shallow clone” that doesn’t require the reference audio’s transcript, and a slower but higher-quality “deep clone” that utilizes the prompt transcript. With just 5 seconds of audio and a text snippet, MARS5 can generate speech for diverse and challenging scenarios, including sports commentary and anime voiceovers, showcasing its versatility and effectiveness.

    To use MARS5, a reference audio file between 2-12 seconds long, with 6-second samples yielding optimal results is provided. The system accepts text input with punctuation and capitalization for prosodic control. Users can perform a “deep clone” for enhanced quality by providing the reference audio’s transcript, though this process takes longer. MARS5’s ability to handle complex prosodic scenarios makes it suitable for various applications in entertainment, education, and accessibility.

    MARS5 TTS represents a significant advancement in open-source text-to-speech technology. Its innovative architecture, combining AR and NAR models with DDPM, enables unprecedented control over speech synthesis. The system’s ability to clone voices with minimal input and generate high-quality, prosodically rich speech positions it as a valuable tool for developers and researchers in the field of artificial intelligence and speech technology.

    Check out the Model and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]

    The post Camb AI Releases MARS5 TTS: A Novel Open Source Text to Speech Model for Insane Prosody appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow Many Academic Papers are Written with the Help of ChatGPT? This AI Paper Delves into ChatGPT Usage in Academic Writing through Excess Vocabulary
    Next Article DRR-RATE: A Large Scale Synthetic Chest X-ray Dataset Complete with Labels and Radiological Reports

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4732 – TOTOLINK A3002R/A3002RU HTTP POST Request Handler Buffer Overflow

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How to Generate a CrowdStrike RFM Report With AI in Tines

    Development

    Mirai Botnet targeting OFBiz Servers Vulnerable to Directory Traversal

    Development

    The best MacBooks of 2025: Expert tested and reviewed

    News & Updates

    The Bright Side of Bias: How Cognitive Biases Can Enhance Recommendations

    Development

    Highlights

    On-Demand Project-Based IT Networking Installation in Delhi

    May 13, 2025

    Post Content Source: Read More 

    Intel’s bold security claims: Mudslinging or genuine warnings for AMD & NVIDIA?

    February 12, 2025

    Saved Places on Google Maps Disappeared [6 Tested Fixes]

    June 24, 2024

    The Best Free Programs Online

    May 17, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.