This guide will teach you how to transcribe YouTube videos with Node.js and AssemblyAI. After creating the transcript, you’ll learn how to generate SRT subtitles, and lastly, you’ll use LeMUR to prompt the video using a Large Language Model (LLM).
Step 1: Set up your development environment
First, install Node.js 18 or higher on your system.
Next, create a new project folder, change directories to it, and initialize a new Node.js project:
mkdir transcribe-youtube-video
cd transcribe-youtube-video
npm init -y
Open the package.json file and add type: “module”, to the list of properties.
{
…
“type”: “module”,
…
}
This will tell Node.js to use the ES Module syntax for exporting and importing modules, and not to use the old CommonJS syntax.
Then, install the necessary NPM modules:
assemblyai installs the AssemblyAI JavaScript SDK makes it easier to interact with the AssemblyAI API.
youtube-dl-exec wraps the yt-dlp CLI tool which lets you retrieve information about YouTube videos and download them.
tsx lets you execute TypeScript code without additional setup
npm install –save assemblyai youtube-dl-exec tsx
You must also install Python 3.7 or above on your system as python3, because it is required by youtube-dl-exec.
Next, you need an AssemblyAI API key that you can find on your dashboard. If you don’t have an AssemblyAI account, first sign up for free. Once you’ve copied your API key, configure it as the ASSEMBLYAI_API_KEY environment variable on your machine:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
You can find the full source code of this application in this GitHub repository.
Step 2. Retrieve the audio of a YouTube video
To transcribe a video with AssemblyAI, you either need a public URL to the video file or upload the video file to AssemblyAI. Although, you only need the audio track of a video to generate a transcript, so you can also use a public URL to the audio track, or upload the audio to AssemblyAI.
YouTube stores the audio and the video of a YouTube video in separate files, which you can retrieve in different formats and quality. The easiest way to retrieve the formats is using the yt-dlp CLI tool. The youtube-dl-exec module you installed wraps the yt-dlp CLI tool so you can retrieve this information from Node.js.
Create a file called index.ts and add the following code:
import { youtubeDl } from “youtube-dl-exec”;
const youtubeVideoUrl = “https://www.youtube.com/watch?v=wtolixa9XTg”;
console.log(“Retrieving audio URL from YouTube video”);
const videoInfo = await youtubeDl(youtubeVideoUrl, {
dumpSingleJson: true,
preferFreeFormats: true,
addHeader: [“referer:youtube.com”, “user-agent:googlebot”],
});
const audioUrl = videoInfo.formats.reverse().find(
(format) => format.resolution === “audio only” && format.ext === “m4a”,
)?.url;
if (!audioUrl) {
throw new Error(“No audio only format found”);
}
console.log(“Audio URL retrieved successfully”);
console.log(“Audio URL:”, audioUrl);
This script retrieves all the information about the YouTube video and stores it in the videoInfo variable.
The formats property lists all the available video formats, ordered from worst to best quality. The script reverses the formats array so the best quality comes first, then looks for the first “audio only” format with m4a extension, and takes that format’s url property.
Now that you have the audio URL of the YouTube video, you can transcribe the audio using AssemblyAI.
At the top of index.ts, import the AssemblyAI class from the assemblyai module:
import { AssemblyAI } from ‘assemblyai’;
Then append the following code at the end of the index.ts file:
console.log(“Transcribing audio”);
const aaiClient = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY!,
});
const transcript = await aaiClient.transcripts.transcribe({
// can also accept videos and local files
audio: audioUrl,
});
The code sends the audio to AssemblyAI for transcription. If the transcription is successful, the transcript object will be populated with the transcript text and many additional properties. However, you should verify whether an error occurred and log the error.
Add the following code to check if an error occurred:
if (transcript.status === “error”) {
throw new Error(“Transcription failed: ” + transcript.error);
}
console.log(“Transcription complete”);
Step 3. Save the transcript and subtitles
Now that you have a transcript, you can save the transcript text to a file. Add the following import which you’ll need to save files to disk.
import { writeFile } from “fs/promises”
Then add the following code to save the transcript to disk.
console.log(“Saving transcript to file”);
await writeFile(“./transcript.txt”, transcript.text!);
console.log(“Transcript saved to file transcript.txt”);
You can also generate SRT subtitles from the transcript and save it to disk like this:
console.log(“Retrieving transcript as SRT subtitles”);
const subtitles = await aaiClient.transcripts.subtitles(transcript.id, “srt”);
await writeFile(“./subtitles.srt”, subtitles);
console.log(“Subtitles saved to file subtitles.srt”);
WebVTT Subtitle Format
WebVTT file or Web Video Text to Track File is another widely supported and popular subtitle format. To generate WebVTT, replace “srt” with “vtt”, and save the file with the vtt-extension.
Step 4. Run the script
To run the script, go back to your shell and run:
npx tsx index.ts
After a little while you’ll see the transcript text and subtitles appear on your disk. This will take longer if the YouTube video is longer.
Bonus: Prompt a YouTube video using LeMUR
AssemblyAI makes it very easy to build generative AI features using our LLM framework called LeMUR.
You can write a prompt to tell the LLM what to do with a given transcript and the LLM will generate a response.
For example, you can write a prompt that tells LeMUR to summarize the video using bullet points.
console.log(“Prompting LeMUR to summarize the video”);
const prompt = “Summarize this video using bullet points”;
const lemurResponse = await aaiClient.lemur.task({
transcript_ids: [transcript.id],
prompt,
final_model: “default”
});
console.log(prompt + “: ” + lemurResponse.response);
You can find the various supported models listed in the LeMUR documentation.
If you add this code and run the script again, you’ll get a generated summary that looks like this:
Here is a bullet point summary of the key points from the video:
– Lay the math foundation with Khan Academy courses on basics like linear algebra, calculus, statistics etc. Come back later to fill gaps.
– Learn Python – do a beginner and intermediate level course to get a solid base. Python skills are essential.
– Learn key machine learning Python libraries like NumPy, Pandas, Matplotlib. Follow a crash course for each.
– Do Andrew Ng’s machine learning specialization course on Coursera. Recently updated to include Python and libraries like NumPy, Scikit-learn, TensorFlow.
– Implement some algorithms from scratch in Python to better understand concepts. An updated ML from scratch course will be released.
– Do Kaggle’s intro and intermediate ML courses to learn more data preparation with Pandas.
– Practice on Kaggle with competitions and datasets. Helps build portfolio and CV. Focus on learning over winning.
– Specialize as per industry requirements in CV, NLP etc. Look at job descriptions. Consider learning MLOps.
– Start a blog to write tutorials and share your projects. Helps cement knowledge and build CV.
– Useful books referenced: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow, Machine Learning Yearning by Andrew Ng.
Next steps
In this tutorial, you learned how to retrieve the audio file from a YouTube video, how to transcribe the audio file and generate subtitles, and finally, how to summarize the YouTube video using LeMUR.
Check out our Audio Intelligence models and LeMUR to add even more capabilities to your audio and video applications.
Alternatively, feel free to check out our blog or YouTube channel for educational content on AI and Machine Learning, or feel free to join us on Twitter or Discord to stay in the loop when we release new content.
Source: Read MoreÂ