Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Droip: The Modern Website Builder WordPress Needed

      July 8, 2025

      Last week in AI dev tools: Cloudflare blocking AI crawlers by default, Perplexity Max subscription, and more (July 7, 2025)

      July 7, 2025

      Infragistics Launches Ultimate 25.1 With Major Updates to App Builder, Ignite UI

      July 7, 2025

      Design Guidelines For Better Notifications UX

      July 7, 2025

      There’s a massive 42% Amazon Prime Day discount on the Razer DeathAdder V3 Pro — One of the best gaming mice we gave a near-perfect score to

      July 8, 2025

      This 360Hz QD-OLED monitor is more than magnificent — and it’s $280 off right now

      July 8, 2025

      Diablo 4, one of Blizzard’s best Xbox games, is now 64% off — a devilish Anti-Amazon Prime Day discount that’s worth taking over Amazon’s deals

      July 8, 2025

      “One of the best and most premium charging accessories” — Razer Universal Quick Charging Stand for Xbox is 40% off

      July 8, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      AI and Digital Trends Marketing and IT Leaders Need to Know

      July 8, 2025
      Recent

      AI and Digital Trends Marketing and IT Leaders Need to Know

      July 8, 2025

      Blade Authorization Directives for View Security

      July 8, 2025

      Laravel AI Chat Starter Kit

      July 8, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      There’s a massive 42% Amazon Prime Day discount on the Razer DeathAdder V3 Pro — One of the best gaming mice we gave a near-perfect score to

      July 8, 2025
      Recent

      There’s a massive 42% Amazon Prime Day discount on the Razer DeathAdder V3 Pro — One of the best gaming mice we gave a near-perfect score to

      July 8, 2025

      This 360Hz QD-OLED monitor is more than magnificent — and it’s $280 off right now

      July 8, 2025

      Diablo 4, one of Blizzard’s best Xbox games, is now 64% off — a devilish Anti-Amazon Prime Day discount that’s worth taking over Amazon’s deals

      July 8, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    April 3, 2025

    Automatic speech recognition (ASR) technologies have advanced significantly, yet notable disparities remain in their ability to accurately recognize diverse languages. Prominent ASR systems, such as OpenAI’s Whisper, exhibit pronounced performance gaps when processing Eastern languages compared to Western counterparts. This discrepancy presents tangible challenges in multilingual regions, particularly those characterized by numerous dialects and linguistic variations, underscoring the necessity for sophisticated multilingual ASR systems tailored specifically to Eastern languages.

    Researchers from Dataocean AI and Tsinghua University have introduced Dolphin, a comprehensive multilingual automatic speech recognition model built upon an extended Whisper architecture, optimized to accommodate a broader spectrum of Eastern languages and dialects. Dolphin effectively addresses key limitations identified in current multilingual ASR models by integrating both proprietary datasets and publicly accessible datasets. The model proficiently supports 40 Eastern languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 distinct dialects of Chinese.

    Dolphin employs a hybrid ASR approach combining Connectionist Temporal Classification (CTC) with attention-based mechanisms. Its architecture incorporates an E-Branchformer encoder and a Transformer decoder, substantially enhancing the model’s capability to interpret complex linguistic patterns across diverse languages. Dolphin also utilizes a dual-level language tokenization system, distinguishing general language codes from region-specific dialect tokens. This mechanism improves recognition accuracy and resolution, particularly for dialect-intensive languages such as Chinese. Additionally, Dolphin incorporates a 4× subsampling layer to efficiently reduce input sequence lengths, enhancing computational speed and training effectiveness without compromising recognition accuracy.

    Experimental evaluations demonstrate Dolphin’s marked improvements in multilingual speech recognition accuracy relative to Whisper models. For instance, the Dolphin small model reduced the Word Error Rate (WER) by approximately 24.5% compared to the base model, with further incremental improvements observed in medium and large variants. Specifically, the Dolphin base model attained an average WER of 31.8%, notably outperforming Whisper’s large-v3 model, which recorded an average WER of 52.3% across the same evaluation benchmarks. Assessments conducted on dialect-focused datasets, including KeSpeech, confirmed Dolphin’s capability to consistently handle intricate linguistic variations, with performance enhancements correlating positively with increased model size.

    The research team released the Dolphin base and small models publicly under the Apache 2.0 license, along with associated inference code. Dolphin’s training utilized an extensive dataset encompassing 21.2 million hours of audio recordings, incorporating 7.4 million hours derived from open datasets such as Common Voice, ReazonSpeech, and GigaSpeech2, thereby ensuring robustness and replicability.

    In summary, Dolphin constitutes a significant advancement in multilingual ASR technology, systematically addressing prevailing limitations in Eastern language and dialect recognition through methodological data integration, refined architectural frameworks, and commitment to open-source dissemination. This work sets an influential benchmark for future developments in multilingual ASR research, advancing linguistic inclusivity and system generalization.


    Check out the Paper, Dolphin-small-model and Dolphin-base-model. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePlaywright Mobile Automation for Seamless Web Testing
    Next Article This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 8, 2025
    Machine Learning

    The Geometries of Truth Are Orthogonal Across Tasks

    July 7, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5743 – “Web Server Charging Station OS Command Injection”

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5301 – ONLYOFFICE Docs DocumentServer Reflected Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-28957 – OwnerRez Cross-Site Scripting

    Common Vulnerabilities and Exposures (CVEs)
    Automating regulatory compliance: A multi-agent solution using Amazon Bedrock and CrewAI

    Automating regulatory compliance: A multi-agent solution using Amazon Bedrock and CrewAI

    Machine Learning

    Highlights

    CVE-2025-2812 – Mydata Informatics Ticket Sales Automation SQL Injection

    May 2, 2025

    CVE ID : CVE-2025-2812

    Published : May 2, 2025, 9:15 a.m. | 4 hours, 5 minutes ago

    Description : Improper Neutralization of Special Elements used in an SQL Command (‘SQL Injection’) vulnerability in Mydata Informatics Ticket Sales Automation allows Blind SQL Injection.This issue affects Ticket Sales Automation: before 03.04.2025 (DD.MM.YYYY).

    Severity: 9.8 | CRITICAL

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-6003 – WordPress Single Sign-On (SSO) Plugin Unauthenticated Sensitive Data Disclosure

    June 12, 2025

    CVE-2025-4353 – Golden Link Secondary System SQL Injection Vulnerability

    May 6, 2025

    CVE-2025-47655 – themarketer2023 CSRF Stored XSS Vulnerability

    May 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.