Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»MMS Zero-shot Released: A New AI Model to Transcribe the Speech of Almost Any Language Using Only a Small Amount of Unlabeled Text in the New Language

    MMS Zero-shot Released: A New AI Model to Transcribe the Speech of Almost Any Language Using Only a Small Amount of Unlabeled Text in the New Language

    August 2, 2024

    Speech recognition is a rapidly evolving field that enables machines to understand and transcribe human speech across various languages. This technology is vital for virtual assistants, automated transcription services, and language translation applications. Despite significant advancements, the challenge of covering all languages, particularly low-resource ones, remains substantial.

    A major issue in speech recognition is the need for labeled data for many languages, making it difficult to build accurate models. Traditional approaches rely heavily on large datasets of transcribed speech, which are only available for some of the world’s languages. This limitation significantly hinders the development of universal speech recognition systems. Moreover, existing methods often require complex linguistic rules or large amounts of audio and text data, impractical for many low-resource languages.

    Existing methods for speech recognition involve either supervised learning with extensive labeled data or unsupervised learning requiring both audio and text data. However, these methods are insufficient for many low-resource languages due to the need for more data. Zero-shot approaches have emerged, aiming to recognize new languages without direct training on labeled data from those languages. These approaches face challenges with phoneme mapping accuracy, especially when the phonemizer performs poorly for unseen languages, resulting in high error rates.

    Researchers from Monash University and Meta FAIR introduced MMS Zero-shot, a simpler and more effective approach to zero-shot speech recognition. This method leverages romanization and an acoustic model trained on 1,078 languages, significantly more than previous models. The research demonstrates substantial improvements in character error rate (CER) for unseen languages. This novel approach sidesteps the complexity of language-specific phonemizers by standardizing text to a common Latin script through romanization.

    The proposed method involves training an acoustic model on a romanized version of the text from 1,078 languages. This model outputs romanized text during inference, which is then mapped to words using a simple lexicon. The romanization process standardizes diverse writing systems into a common Latin script, simplifying the model’s task and improving accuracy. The acoustic model is fine-tuned on labeled data from languages with available transcripts, ensuring it can generalize to unseen languages. The method also incorporates a lexicon and, optionally, a language model to enhance decoding accuracy during inference.

    The MMS Zero-shot method reduces the average CER by 46% relative to previous models on 100 unseen languages. Specifically, the CER is reduced to just 2.5 times higher than in-domain supervised baselines. This improvement is substantial considering the method requires no labeled data for the evaluation languages. The research shows that a romanization-based approach can achieve high accuracy compared to traditional phoneme-based methods, which often need help with unseen languages. For instance, the model achieves an average CER of 32.3% on the MMS test set, 29.8% on the FLEURS test set, and 36.4% on the CommonVoice test set, showcasing its robust performance across different datasets.

    In conclusion, the research addresses the critical problem of speech recognition for low-resource languages by introducing a novel zero-shot approach. With its extensive language training and romanization technique, the MMS Zero-shot method offers a promising solution to the data scarcity challenge, advancing the field towards more inclusive and universal speech recognition systems. This approach by Monash University and Meta FAIR researchers paves the way for more accurate and accessible speech recognition technologies, potentially transforming applications across various domains where language diversity is a significant barrier. Integrating a simple lexicon and using a universal romanizer like uroman further enhance the method’s applicability and accuracy, making it an important step forward in the field.

    Check out the Paper, Code, and Demo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 47k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post MMS Zero-shot Released: A New AI Model to Transcribe the Speech of Almost Any Language Using Only a Small Amount of Unlabeled Text in the New Language appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTheia: A Robot Vision Foundation Model that Simultaneously Distills Off-the-Shelf VFMs such as CLIP, DINOv2, and ViT
    Next Article Transformative Impact of Artificial Intelligence AI on Medicine: From Imaging to Distributed Healthcare Systems

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-46616 – Quantum StorNext Web GUI API RCE

    Common Vulnerabilities and Exposures (CVEs)

    How to Convert Image to Text in Microsoft Word

    Artificial Intelligence

    Exploring the Essential Types of Enterprise Software for Modern Businesses

    Web Development

    Revolutionizing large language model training with Arcee and AWS Trainium

    Development

    Highlights

    News & Updates

    Microsoft’s AI push includes funding a college student’s innovative startup

    February 4, 2025

    Microsoft backed a UK college students startup creating an AI solution for call centers to…

    I spoke with Intel and premier laptop brands in my search for the “killer AI app” — finding my answer at MWC Barcelona 2025

    March 16, 2025

    CISA Sounds the Alarm: Broadcom and Commvault Flaws Under Active Exploitation! ️

    April 30, 2025

    How Cloud Computing Improves Web Development Efficiency

    January 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.