Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer

    Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer

    February 9, 2025

    Real-time speech translation presents a complex challenge, requiring seamless integration of speech recognition, machine translation, and text-to-speech synthesis. Traditional cascaded approaches often introduce compounding errors, fail to retain speaker identity, and suffer from slow processing, making them less suitable for real-time applications like live interpretation. Additionally, existing simultaneous translation models struggle to balance accuracy and latency, relying on complex inference mechanisms that are difficult to scale. A significant barrier remains the lack of large-scale, well-aligned speech datasets, limiting the ability to train models that can generate contextually accurate and natural translations with minimal delay.

    Kyutai has developed Hibiki, a 2.7 billion-parameter decoder-only model designed for real-time speech-to-speech (S2ST) and speech-to-text (S2TT) translation. Operating at 12.5Hz framerate with a 2.2kbps bitrate, Hibiki currently supports French-to-English translation and is designed to preserve voice characteristics in the translated output. A distilled version, Hibiki-M (1.7B parameters), is optimized for real-time performance on smartphones, making it more accessible for on-device translation.

    Technical Approach and Benefits

    Hibiki’s decoder-only architecture enables simultaneous speech processing using a multistream language model that predicts both text and audio tokens. It employs a neural audio codec (Mimi) to compress audio while maintaining fidelity, ensuring efficient translation generation. A key aspect of its design is contextual alignment, a method that leverages a text translation model’s perplexity to determine optimal timing for generating speech, allowing Hibiki to adjust translation delays dynamically while maintaining coherence. Additionally, Hibiki supports batch inference, processing up to 320 sequences in parallel on H100 GPUs, making it viable for large-scale applications. The model is trained on 7M hours of English audio, 450K hours of French, and 40K hours of synthetic parallel data, contributing to its robustness across varied speech patterns.

    Performance and Evaluation

    Hibiki has demonstrated strong performance in translation quality and speaker fidelity. It achieves an ASR-BLEU score of 30.5, surpassing existing baselines, including offline models. Human evaluations rate its naturalness at 3.73/5, approaching the 4.12/5 score of professional human interpreters. The model also performs well in speaker similarity, with a 0.52 similarity score compared to 0.43 for Seamless. Compared to Seamless and StreamSpeech, Hibiki consistently delivers higher translation quality and better voice transfer, while maintaining a competitive latency. The distilled Hibiki-M variant, though slightly lower in speaker similarity, remains effective for real-time on-device use.

    Conclusion

    Hibiki provides a practical approach to real-time speech translation, integrating contextual alignment, efficient compression, and real-time inference to improve translation quality while preserving natural speech characteristics. By offering an open-source release under a permissive CC-BY license, Hibiki has the potential to contribute significantly to advancements in multilingual communication.

    • Hibiki 2B for PyTorch (bf16): kyutai/hibiki-2b-pytorch-bf16
    • Hibiki 1B for PyTorch (bf16): kyutai/hibiki-1b-pytorch-bf16
    • Hibiki 2B for MLX (bf16): kyutai/hibiki-2b-mlx-bf16
    • Hibiki 1B for MLX (bf16): kyutai/hibiki-1b-mlx-bf16

    Check out the Paper, Models on Hugging Face, GitHub Page and Colab Notebook. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Join our machine learning community on Twitter/X

    The post Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTiMidity++ – MIDI to WAVE converter and player
    Next Article This AI Paper Introduces MAETok: A Masked Autoencoder-Based Tokenizer for Efficient Diffusion Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    3 Essential Design Trends, July 2024

    Development

    ‘AI Scientist’ performs fully automatic scientific discovery

    Artificial Intelligence

    Mitigating Hallucinations in Large Vision-Language Models: A Latent Space Steering Approach

    Machine Learning

    You can now remove Android/iPhone from Phone Link app & Mobile devices setting

    Operating Systems

    Highlights

    News & Updates

    Xbox and Microsoft reveal global price increases for consoles, accessories — and even games

    May 1, 2025

    Microsoft confirmed to us today that it is increasing the price of Xbox consoles, accessories,…

    Spectral Shears

    March 16, 2025

    Meet Agentarium: A Powerful Python Framework for Managing and Orchestrating AI Agents

    January 2, 2025

    Meet ONI: A Distributed Architecture for Simultaneous Reinforcement Learning Policy and Intrinsic Reward Learning with LLM Feedback

    December 26, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.