Speech recognition in the browser using Web Speech API

Speech recognition has become an increasingly popular feature in modern web applications. With the Web Speech API, developers can easily incorporate speech-to-text functionality into their web projects. This API provides the tools needed to perform real-time transcription directly in the browser, allowing users to control your app with voice commands or simply dictate text.

In this blog post, youâ€™ll learn how to set up speech recognition using the Web Speech API. Weâ€™ll create a simple web page that lets users record their speech and convert it into text using the Web Speech API. Here is a screenshot of the final app:

Final app: Speech Recognition in your browser using the Web Speech API

Â Before we set up the app, letâ€™s learn about the Web Speech API and how it works.

What is the Web Speech API?

The Web Speech API is a web technology that allows developers to add voice capabilities to their applications. It supports two key functions: speech recognition (turning spoken words into text) and speech synthesis (turning text into spoken words). This enables users to interact with websites using their voice, enhancing accessibility and user experience.

The Web Speech API consists of two parts:

SpeechRecognition: Provides functionality to capture audio input through the userâ€™s microphone, converts it into digital signals, and sends this data to a cloud-based speech recognition engine, such as Googleâ€™s Speech Recognition. The engine processes the speech and returns the transcribed text back to the browser. This happens in real-time, allowing for dynamic, continuous transcription or voice command execution as the user speaks. Hereâ€™s a minimal code example of the SpeechRecognition interface:// Set up a SpeechRecognition object
const recognition = new SpeechRecognition();
// Start and stop recording
recognition.start();
recognition.stop();
// Handle the result in a callback
recognition.addEventListener(“result”, onResult);SpeechSynthesis: This part of the API takes text provided by the application and converts it into spoken words using the browserâ€™s built-in voices. The exact voice and language used depend on the userâ€™s device and operating system, but the browser handles the synthesis locally without needing an internet connection.

The Web Speech API abstracts these complex processes, so developers can easily integrate voice features without needing specialized infrastructure or machine learning expertise.

Prerequisites

Letâ€™s walk through each step of setting up the Web Speech API on a website, and by the end, youâ€™ll have a fully functional speech recognition web app.

To follow along with this guide, you need:

A basic understanding of HTML, JavaScript, and CSS.A modern browser (like Chrome) that supports the Web Speech API.

The full code is also available on GitHub here.

Step 1: Set up the Project Structure

First, create a folder for your project, and inside it, add three files:

index.html: To define the structure of your web page.speech-api.js: To handle speech recognition using JavaScript.style.css:Â To style the web page.

Step 2: Write the HTML File

Weâ€™ll start by writing the HTML code that will display the speech recognition UI. The page should contain a button for starting and stopping the recording, and a section for displaying the transcription results.

Add the following code to index.html:

<!DOCTYPE html>
<html lang=”en”>
Â Â <head>
Â Â Â Â <meta charset=”UTF-8″>
Â Â Â Â <meta name=”viewport” content=”width=device-width, initial-scale=1.0″>
Â Â Â Â <title>Web Speech API example</title>
Â Â Â Â <link rel=”stylesheet” href=”./style.css” />
Â Â </head>
Â Â <body>
Â Â Â Â Â Â <h1>Web Speech API example</h1>
Â Â Â Â Â Â <p>Click the button and start speaking</p>
Â Â Â Â Â Â <button id=”recording-button”>Start recording</button>
Â Â Â Â Â Â <div id=”transcription-result”></div>
Â Â Â Â Â Â <p id=”error-message” hidden aria-hidden=”true”>
Â Â Â Â Â Â Â Â Button was removed<br>Your browser doesn’t support Speech Recognition with the Web Speech API
Â Â Â Â Â Â </p>
Â Â Â Â <script src=”speechAPI.js”></script>
Â Â </body>
</html>

This HTML sets up a simple layout with a button that will trigger speech recognition and a div to display the transcription results. If the Web Speech API isnâ€™t supported by the browser, an error message will appear. The error message is hidden initially but can be made visible through JavasScript.

At the bottom of the body, weâ€™ll include a script that points to the speech-api.js file with the Web Speech API logic.

Step 3: Implement Speech Recognition API logic

Now, weâ€™ll move on to writing the JavaScript code to handle speech recognition. Create the speech-api.js file and add the following code:

Explanation of the JavaScript Code

Checking browser support: We first check whether the SpeechRecognition API is supported by the browser. If not, we hide the recording button and display an error message.Setting up Speech Recognition:We initialize the SpeechRecognition object and set continuous to true so that the API continuously listens to the userâ€™s speech until itâ€™s manually stopped.interimResults is also set to true so that users can see the live transcription in real-time, instead of only showing text when the end of a sentence is detected.Handling the speech event: The onResult function is triggered whenever speech recognition detects spoken words. It iterates over the recognized results and updates the transcriptionResult div with the spoken text. Final results (when the speech has completed) are styled differently using the .final class.Handling the button click: The onClick function toggles the recording state. If speech recognition is active, it will stop the recognition; otherwise, it will start listening for speech.

Step 4: Style the Web Page

Next, letâ€™s add some styles to make the page a bit more visually appealing. Create the style.css file and add the following styles:

html,
body {
Â Â font-family: Arial, sans-serif;
Â Â text-align: center;
}
#transcription-result {
Â Â font-size: 18px;
Â Â color: #5e5e5e;
}
#transcription-result .final {
Â Â color: #000;
}
#error-message {
Â Â color: #ff0000;
}
button {
Â Â font-size: 20px;
Â Â font-weight: 200;
Â Â color: #fff;
Â Â background: #2f2ff2;
Â Â width: 220px;
Â Â border-radius: 20px;
Â Â margin-top: 2em;
Â Â margin-bottom: 2em;
Â Â padding: 1em;
Â Â cursor: pointer;
}
button:hover,
button:focus {
Â Â background: #2f70f2;
}

This CSS file ensures the button is easily clickable and the transcription result is clearly visible. The .final class makes the final transcription results appear in bold black. Every time the end of a sentence is detected, youâ€™ll notice the interim gray text changes to black text.

Step 5: Test the Web App

Once everything is in place, open the index.html file in a browser that supports the Web Speech API (such as Google Chrome). You should see a button labeled “Start recording”. When you click it, the browser will prompt you to grant permission to use the microphone.

After you allow the browser access, the app will start transcribing any spoken words into text and display them on the screen. The transcription results will continue to appear until you click the button again to stop recording.

Conclusion

Youâ€™ve learned what the Web Speech API is and how you can use it. With just a few lines of code, you can easily add speech recognition to your web projects using the Web Speech API. Check out the official documentation to learn more.

If youâ€™re looking for an alternative with more features and higher transcription accuracy, we also recommend trying out the AssemblyAI JavaScript SDK.

To learn more about how you can analyze audio files with AI and get inspired to build more Speech AI features into your app, check out more of our blog, like this article on Adding Punctuation, Casing, and Formatting to your transcriptions, or this guide on Summarizing audio with LLMs in Node.js.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

Error’d: Infallabella

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

If ChatGPT produces AI-generated code for your app, who does it really belong to?

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Predicting the (actually very exciting) future of next gen Xbox hardware

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Asus bombards Windows 11 with christmas.exe malware-like Christmas wreath banner

Speech recognition in the browser using Web Speech API

What is the Web Speech API?

Prerequisites

Step 1: Set up the Project Structure

Step 2: Write the HTML File

Step 3: Implement Speech Recognition API logic

Explanation of the JavaScript Code

Step 4: Style the Web Page

Step 5: Test the Web App

Conclusion

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Jollibee Probes Alleged Data Breach Affecting 32 Million Customers, Asks Public to Remain Vigilant

Veeam Issues Patch for Critical RCE Vulnerability in Service Provider Console

Microsoft releases an extensive list of changes in New Teams compared to Classic Teams

The Mamba in the Llama: Accelerating Inference with Speculative Decoding

How to use ChatGPT in accounting

Converting Collections to Queries in Laravel Using toQuery()

Boost your CLI skills with GitHub Copilot

As Windows 10’s death looms, Microsoft recommends upgrading to Windows 11 because it will help you stay updated on celebrity gossip, stocks, and weather updates

Speech recognition in the browser using Web Speech API

What is the Web Speech API?

Prerequisites

Step 1: Set up the Project Structure

Step 2: Write the HTML File

Step 3: Implement Speech Recognition API logic

Explanation of the JavaScript Code

Step 4: Style the Web Page

Step 5: Test the Web App

Conclusion

Related Posts