Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      A Week In The Life Of An AI-Augmented Designer

      August 22, 2025

      This week in AI updates: Gemini Code Assist Agent Mode, GitHub’s Agents panel, and more (August 22, 2025)

      August 22, 2025

      Microsoft adds Copilot-powered debugging features for .NET in Visual Studio

      August 21, 2025

      Blackstone portfolio company R Systems Acquires Novigo Solutions, Strengthening its Product Engineering and Full-Stack Agentic-AI Capabilities

      August 21, 2025

      I found the ultimate MacBook Air alternative for Windows users – and it’s priced well

      August 23, 2025

      Outdated IT help desks are holding businesses back – but there is a solution

      August 23, 2025

      Android’s latest update can force apps into dark mode – how to see it now

      August 23, 2025

      I tried the Google Pixel Watch 4 – and these key features made it feel indispensable

      August 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Building Cross-Platform Alerts with Laravel’s Notification Framework

      August 23, 2025
      Recent

      Building Cross-Platform Alerts with Laravel’s Notification Framework

      August 23, 2025

      Add Notes Functionality to Eloquent Models With the Notable Package

      August 23, 2025

      How to install OpenPlatform — IoT platform

      August 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Basics of Digital Forensics

      August 22, 2025
      Recent

      Basics of Digital Forensics

      August 22, 2025

      Top Linux Server Automation Tools: Simplifying System Administration

      August 22, 2025

      Rising from the Ashes: How AlmaLinux and Rocky Linux Redefined the Post-CentOS Landscape

      August 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    April 3, 2025

    Automatic speech recognition (ASR) technologies have advanced significantly, yet notable disparities remain in their ability to accurately recognize diverse languages. Prominent ASR systems, such as OpenAI’s Whisper, exhibit pronounced performance gaps when processing Eastern languages compared to Western counterparts. This discrepancy presents tangible challenges in multilingual regions, particularly those characterized by numerous dialects and linguistic variations, underscoring the necessity for sophisticated multilingual ASR systems tailored specifically to Eastern languages.

    Researchers from Dataocean AI and Tsinghua University have introduced Dolphin, a comprehensive multilingual automatic speech recognition model built upon an extended Whisper architecture, optimized to accommodate a broader spectrum of Eastern languages and dialects. Dolphin effectively addresses key limitations identified in current multilingual ASR models by integrating both proprietary datasets and publicly accessible datasets. The model proficiently supports 40 Eastern languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 distinct dialects of Chinese.

    Dolphin employs a hybrid ASR approach combining Connectionist Temporal Classification (CTC) with attention-based mechanisms. Its architecture incorporates an E-Branchformer encoder and a Transformer decoder, substantially enhancing the model’s capability to interpret complex linguistic patterns across diverse languages. Dolphin also utilizes a dual-level language tokenization system, distinguishing general language codes from region-specific dialect tokens. This mechanism improves recognition accuracy and resolution, particularly for dialect-intensive languages such as Chinese. Additionally, Dolphin incorporates a 4× subsampling layer to efficiently reduce input sequence lengths, enhancing computational speed and training effectiveness without compromising recognition accuracy.

    Experimental evaluations demonstrate Dolphin’s marked improvements in multilingual speech recognition accuracy relative to Whisper models. For instance, the Dolphin small model reduced the Word Error Rate (WER) by approximately 24.5% compared to the base model, with further incremental improvements observed in medium and large variants. Specifically, the Dolphin base model attained an average WER of 31.8%, notably outperforming Whisper’s large-v3 model, which recorded an average WER of 52.3% across the same evaluation benchmarks. Assessments conducted on dialect-focused datasets, including KeSpeech, confirmed Dolphin’s capability to consistently handle intricate linguistic variations, with performance enhancements correlating positively with increased model size.

    The research team released the Dolphin base and small models publicly under the Apache 2.0 license, along with associated inference code. Dolphin’s training utilized an extensive dataset encompassing 21.2 million hours of audio recordings, incorporating 7.4 million hours derived from open datasets such as Common Voice, ReazonSpeech, and GigaSpeech2, thereby ensuring robustness and replicability.

    In summary, Dolphin constitutes a significant advancement in multilingual ASR technology, systematically addressing prevailing limitations in Eastern language and dialect recognition through methodological data integration, refined architectural frameworks, and commitment to open-source dissemination. This work sets an influential benchmark for future developments in multilingual ASR research, advancing linguistic inclusivity and system generalization.


    Check out the Paper, Dolphin-small-model and Dolphin-base-model. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePlaywright Mobile Automation for Seamless Web Testing
    Next Article This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 23, 2025
    Machine Learning

    Checklists Are Better Than Reward Models For Aligning Language Models

    August 23, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Microsoft’s latest AI model can accurately forecast the weather: “It doesn’t know the laws of physics, so it could make up something completely crazy”

    News & Updates

    Meta Launches Real-time Caption and Voice Command Features for Specially-Abled Users on WhatsApp, Insta & Facebook

    Operating Systems

    10 Best SolarMovie Alternatives to Watch Free Movies in 2025

    Operating Systems

    CVE-2023-28905 – Skoda MIB3 Infotainment Unit Heap Buffer Overflow

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    FSP Client – FTP-like interface for File Service Protocol

    June 18, 2025

    FSP Client is FTP-like interface for File Service Protocol. It look exactly like classic FTP…

    Hands-On With Copilot 3D — Microsoft’s AI Can Turn a Single Photo Into a 3D Model

    August 8, 2025

    DragonFly BSD is a UNIX-like operating system forked from FreeBSD

    May 30, 2025

    Free DOOM: The Dark Ages Premium Edition? NVIDIA is making it happen.

    April 30, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.