Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 23, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 23, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 23, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 23, 2025

      SteamOS is officially not just for Steam Deck anymore — now ready for Lenovo Legion Go S and sort of ready for the ROG Ally

      May 23, 2025

      Microsoft’s latest AI model can accurately forecast the weather: “It doesn’t know the laws of physics, so it could make up something completely crazy”

      May 23, 2025

      OpenAI scientists wanted “a doomsday bunker” before AGI surpasses human intelligence and threatens humanity

      May 23, 2025

      My favorite gaming service is 40% off right now (and no, it’s not Xbox Game Pass)

      May 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A timeline of JavaScript’s history

      May 23, 2025
      Recent

      A timeline of JavaScript’s history

      May 23, 2025

      Loading JSON Data into Snowflake From Local Directory

      May 23, 2025

      Streamline Conditional Logic with Laravel’s Fluent Conditionable Trait

      May 23, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      SteamOS is officially not just for Steam Deck anymore — now ready for Lenovo Legion Go S and sort of ready for the ROG Ally

      May 23, 2025
      Recent

      SteamOS is officially not just for Steam Deck anymore — now ready for Lenovo Legion Go S and sort of ready for the ROG Ally

      May 23, 2025

      Microsoft’s latest AI model can accurately forecast the weather: “It doesn’t know the laws of physics, so it could make up something completely crazy”

      May 23, 2025

      OpenAI scientists wanted “a doomsday bunker” before AGI surpasses human intelligence and threatens humanity

      May 23, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    April 3, 2025

    Automatic speech recognition (ASR) technologies have advanced significantly, yet notable disparities remain in their ability to accurately recognize diverse languages. Prominent ASR systems, such as OpenAI’s Whisper, exhibit pronounced performance gaps when processing Eastern languages compared to Western counterparts. This discrepancy presents tangible challenges in multilingual regions, particularly those characterized by numerous dialects and linguistic variations, underscoring the necessity for sophisticated multilingual ASR systems tailored specifically to Eastern languages.

    Researchers from Dataocean AI and Tsinghua University have introduced Dolphin, a comprehensive multilingual automatic speech recognition model built upon an extended Whisper architecture, optimized to accommodate a broader spectrum of Eastern languages and dialects. Dolphin effectively addresses key limitations identified in current multilingual ASR models by integrating both proprietary datasets and publicly accessible datasets. The model proficiently supports 40 Eastern languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 distinct dialects of Chinese.

    Dolphin employs a hybrid ASR approach combining Connectionist Temporal Classification (CTC) with attention-based mechanisms. Its architecture incorporates an E-Branchformer encoder and a Transformer decoder, substantially enhancing the model’s capability to interpret complex linguistic patterns across diverse languages. Dolphin also utilizes a dual-level language tokenization system, distinguishing general language codes from region-specific dialect tokens. This mechanism improves recognition accuracy and resolution, particularly for dialect-intensive languages such as Chinese. Additionally, Dolphin incorporates a 4× subsampling layer to efficiently reduce input sequence lengths, enhancing computational speed and training effectiveness without compromising recognition accuracy.

    Experimental evaluations demonstrate Dolphin’s marked improvements in multilingual speech recognition accuracy relative to Whisper models. For instance, the Dolphin small model reduced the Word Error Rate (WER) by approximately 24.5% compared to the base model, with further incremental improvements observed in medium and large variants. Specifically, the Dolphin base model attained an average WER of 31.8%, notably outperforming Whisper’s large-v3 model, which recorded an average WER of 52.3% across the same evaluation benchmarks. Assessments conducted on dialect-focused datasets, including KeSpeech, confirmed Dolphin’s capability to consistently handle intricate linguistic variations, with performance enhancements correlating positively with increased model size.

    The research team released the Dolphin base and small models publicly under the Apache 2.0 license, along with associated inference code. Dolphin’s training utilized an extensive dataset encompassing 21.2 million hours of audio recordings, incorporating 7.4 million hours derived from open datasets such as Common Voice, ReazonSpeech, and GigaSpeech2, thereby ensuring robustness and replicability.

    Hostinger

    In summary, Dolphin constitutes a significant advancement in multilingual ASR technology, systematically addressing prevailing limitations in Eastern language and dialect recognition through methodological data integration, refined architectural frameworks, and commitment to open-source dissemination. This work sets an influential benchmark for future developments in multilingual ASR research, advancing linguistic inclusivity and system generalization.


    Check out the Paper, Dolphin-small-model and Dolphin-base-model. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePlaywright Mobile Automation for Seamless Web Testing
    Next Article This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 23, 2025
    Machine Learning

    Researchers Introduce MMLONGBENCH: A Comprehensive Benchmark for Long-Context Vision-Language Models

    May 23, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Big Node, VS Code, and Mantine updates

    Development

    CVE-2025-37776 – Linux ksmbd Use-After-Free Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Xbox fans can relax, as it seems PlayStation won’t be buying up one of the industry’s best game studios — for now

    Development

    This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning

    Machine Learning
    Hostinger

    Highlights

    CVE-2025-43833 – Amir Helzer Absolute Links SQL Injection Vulnerability

    May 19, 2025

    CVE ID : CVE-2025-43833

    Published : May 19, 2025, 6:15 p.m. | 33 minutes ago

    Description : Improper Neutralization of Special Elements used in an SQL Command (‘SQL Injection’) vulnerability in Amir Helzer Absolute Links allows Blind SQL Injection.This issue affects Absolute Links: from n/a through 1.1.1.

    Severity: 7.6 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Adobe Firefly gets a slew of new image-generating models – including from OpenAI and Google

    April 24, 2025

    CVE-2025-40775 – BIND DNS Invalid TSIG Algorithm Field Vulnerability

    May 21, 2025

    ShadowKV: A High-Throughput Inference System for Long-Context LLM Inference

    November 4, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.