Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Whisper WebGPU: Real-Time in-Browser Speech Recognition with OpenAI Whisper

    Whisper WebGPU: Real-Time in-Browser Speech Recognition with OpenAI Whisper

    June 8, 2024

    Achieving real-time speech recognition directly within a web browser has long been a sought-after milestone. Whisper WebGPU by a Hugging Face Engineer (nickname ‘Xenova’) is a groundbreaking technology that leverages OpenAI’s Whisper model to bring real-time, in-browser speech recognition to fruition. This remarkable development is a monumental shift in interaction with AI-driven web applications.

    The core of Whisper WebGPU lies in the Whisper-base model, a 73-million-parameter speech recognition model meticulously optimized for web inference. With a model size of approximately 200 MB, Whisper-base is designed to be lightweight yet powerful, making it ideal for real-time applications. Once the model is downloaded, it is cached for future use, ensuring that subsequent interactions are swift and seamless.

    The true innovation of Whisper WebGPU is its ability to run entirely within the user’s browser. Utilizing Hugging Face Transformers.js and ONNX Runtime Web, this model performs all computations locally, eliminating the need to send data to a server. This enhances privacy and enables functionality even when the device is offline. Users can disconnect from the internet after the initial model load and benefit from Whisper’s robust speech recognition capabilities.

    One key aspect that makes Whisper WebGPU stand out is its use of ONNX (Open Neural Network Exchange) weights. ONNX is an open-source format for AI models, allowing models trained in different frameworks to be shared and utilized seamlessly. Xenova’s approach of structuring repositories with ONNX weights in a dedicated subfolder named ‘onnx’ sets a precedent for future web-ready models. This temporary solution is anticipated to evolve as WebML (Web Machine Learning) technology matures, promising even more streamlined integrations in the future.

    Xenova recommends converting models to ONNX using Hugging Face Optimum for developers looking to make their models web-ready. This ensures compatibility with ONNX Runtime Web and aligns with the structure demonstrated by Whisper WebGPU, paving the way for easier adoption and integration.

    Whisper WebGPU isn’t just about on-device processing; it’s about doing so with exceptional versatility. The model supports multilingual transcription across 100 languages, making it a universal tool for speech recognition. Whether for transcription, translation, or accessibility applications, Whisper WebGPU brings unprecedented real-time capabilities to the web.

    The implications of this technology are vast. Imagine a web application that can transcribe meetings in real time, provide instant translations during international video calls, or enable voice commands to control web interfaces without the latency or privacy concerns associated with server-based processing.

    Whisper WebGPU represents a significant step forward in the democratization of AI. By enabling advanced speech recognition directly in the browser, it lowers the barrier to entry for developers and end-users alike. Developers no longer need to grapple with complex server infrastructures or worry about data privacy issues associated with cloud processing. Instead, they can leverage the power of Whisper WebGPU to build responsive, secure, and efficient AI-driven applications.

    In conclusion, Whisper WebGPU by Xenova is a paradigm shift in thinking about and utilizing AI on the web. Its real-time, in-browser speech recognition capabilities, support for 100 languages, and robust framework using ONNX and Transformers.js set a new standard for web-based AI applications.

    The post Whisper WebGPU: Real-Time in-Browser Speech Recognition with OpenAI Whisper appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDiffUCO: A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization
    Next Article Notes for my Scaling Fast talk next week

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Hybrid Work Policy

    News & Updates

    Best Steam Deck and ROG Ally deals during Amazon Prime Day: Get gaming handhelds at a discount

    Development

    The AI-embedded Google Pixel 9: Smarter than ever, for better or worse

    Artificial Intelligence

    Linux Voice Assistants: Revolutionizing Human-Computer Interaction with Natural Language Processing

    Development

    Highlights

    The Digital Lab of the Georgian Language

    January 9, 2025

    Post Content Source: Read More 

    A crossroads for computing at MIT

    April 11, 2024

    Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

    April 2, 2025

    Conquering the Complexities of Modern BCDR

    December 7, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.