Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning

    Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning

    February 10, 2025

    Text-to-speech (TTS) technology has made significant strides in recent years, but challenges remain in creating natural, expressive, and high-fidelity speech synthesis. Many TTS systems struggle to replicate the nuances of human speech, such as intonation, emotion, and accent, often resulting in artificial-sounding voices. Additionally, precise voice cloning remains difficult, limiting the ability to generate personalized or diverse speech outputs. These challenges have driven continued research into more sophisticated TTS models capable of producing real-time, expressive, and realistic speech.

    Zyphra has introduced the beta release of Zonos-v0.1, featuring two real-time TTS models with high-fidelity voice cloning. The release includes a 1.6 billion-parameter transformer model and a similarly sized hybrid model, both available under the Apache 2.0 license. This open-source initiative seeks to advance TTS research by making high-quality speech synthesis technology more accessible to developers and researchers.

    The Zonos-v0.1 models are trained on approximately 200,000 hours of speech data, encompassing both neutral and expressive speech patterns. While the primary dataset consists of English-language content, significant portions of Chinese, Japanese, French, Spanish, and German speech have been incorporated, allowing for multilingual support. The models generate lifelike speech from text prompts using either speaker embeddings or audio prefixes. They can perform voice cloning with as little as 5 to 30 seconds of sample speech and offer controls over parameters such as speaking rate, pitch variation, audio quality, and emotions like sadness, fear, anger, happiness, and surprise. The synthesized speech is produced at a 44 kHz sample rate, ensuring high audio fidelity.

    Zonos-v0.1 includes several key features:

    • Zero-shot TTS with Voice Cloning: Users can generate speech by providing a short speaker sample alongside text input, making it possible to synthesize voices with minimal data.
    • Audio Prefix Inputs: By incorporating an audio prefix, the models can better match speaker characteristics and even reproduce specific speaking styles, such as whispering.
    • Multilingual Support: The system supports multiple languages, including English, Japanese, Chinese, French, and German, increasing its versatility for global applications.
    • Audio Quality and Emotion Control: Users can fine-tune aspects such as pitch, frequency range, and emotional tone to create more expressive and natural speech outputs.
    • Efficient Performance: Running at approximately twice real-time speed on an RTX 4090, the models are optimized for real-time applications.
    • User-friendly Interface: A Gradio-based WebUI simplifies speech generation, making it accessible to a broader range of users.
    • Straightforward Deployment: The models can be installed and deployed easily using a provided Docker setup, ensuring ease of integration into existing workflows.

    These features make Zonos-v0.1 a flexible tool for various TTS applications, from content creation to accessibility tools.

    Early evaluations suggest that Zonos-v0.1 delivers high-quality speech generation, often comparable to or exceeding leading proprietary systems. While objective audio evaluation remains complex, comparisons with other models—including proprietary solutions such as ElevenLabs and Cartesia, as well as open-source alternatives like FishSpeech-v1.5—highlight Zonos’s ability to produce clear, natural, and expressive speech. The hybrid model, in particular, offers reduced latency and lower memory usage compared to the transformer variant, benefiting from its Mamba2-based architecture, which minimizes reliance on attention mechanisms.

    The beta release of Zonos-v0.1 represents an important step forward in open-source TTS development. By providing a high-fidelity, expressive, and real-time speech synthesis tool under an accessible license, Zyphra offers developers and researchers a powerful resource for advancing TTS applications. Its combination of voice cloning, multilingual support, and fine-grained audio control makes it a versatile addition to the field, with potential applications in assistive technologies, content creation, and beyond.


    Check out the Technical details, GitHub Page, Zyphra/Zonos-v0.1-transformer and Zyphra/Zonos-v0.1-hybrid. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    The post Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDistribution Release: EndeavourOS 2025.02.08
    Next Article Transforming credit decisions using generative AI with Rich Data Co and AWS

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    Enigmata’s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Meet the Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

    Development

    Microsoft Tackles 9 Zero-Day Exploits in August 2024 Patch Tuesday Update

    Development

    The AI Fix #46: AI can read minds now, and is your co-host a clone?

    Development

    Israeli athletes doxed at Olympic Games by Zeus hacking group

    Development

    Highlights

    CVE-2025-4718 – Campcodes Sales and Inventory System SQL Injection

    May 15, 2025

    CVE ID : CVE-2025-4718

    Published : May 15, 2025, 9:15 p.m. | 3 hours, 42 minutes ago

    Description : A vulnerability has been found in Campcodes Sales and Inventory System 1.0 and classified as critical. Affected by this vulnerability is an unknown functionality of the file /pages/customer_add.php. The manipulation of the argument last leads to sql injection. The attack can be launched remotely. The exploit has been disclosed to the public and may be used. Other parameters might be affected as well.

    Severity: 7.3 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Workday Testing: The Smart Move for Scalable Business Growth

    Workday Testing: The Smart Move for Scalable Business Growth

    April 8, 2025

    Object-Oriented Programming in Kotlin

    May 22, 2024

    Android Automotive users are about to see a lot more apps in their vehicles

    January 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.