Hey 👋, this weekly update contains the latest info on our new product features, tutorials, and our community.
New Language Support for Speaker Diarization
AssemblyAI’s Speaker Diarization model now supports five additional languages: Chinese 🇨🇳, Hindi 🇮🇳, Japanese 🇯🇵, Korean 🇰🇷, and Vietnamese 🇻🇳. This feature is available in both our Best and Nano tiers.Â
The Speaker Diarization model detects multiple speakers in an audio file and identifies what each speaker said. To start building with this feature, simply set speaker_labels to true in your transcription configuration. For more examples, check out our documentation.
Fresh From Our Blog
Automatically determine video sections with AI using Python: Learn how to automatically determine video sections, how to generate section titles with LLMs, and how to format the information for YouTube chapters. Read more>>
Filter profanity from audio files using Python: Learn how to filter profanity out of audio and video files with fewer than 10 lines of code in this tutorial. Read more>>
How to use audio data in LlamaIndex with Python: Discover how to incorporate audio files into LlamaIndex and build an LLM-powered query engine in this step-by-step tutorial. Read more>>
Our Trending YouTube Tutorials
Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python: Learn how to build a real-time AI voice assistant using Python that can handle incoming calls, transcribe speech, generate intelligent responses, and provide a human-like conversational experience. Perfect for call centers, customer support, and virtual receptionist applications.Â
How to use @postman to test LLMs with audio data (Transcribe and Understand): Learn how to transcribe audio and video files using AssemblyAI and also how to use LeMUR, AssemblyAI’s framework for using Large Language Models on spoken data without having to code at all.
Build A Talking AI with LLAMA 3 (Python tutorial): This tutorial shows you how to build a talking AI using real-time transcription with AssemblyAI, using LLAMA 3 as the language model with Ollama, and ElevenLabs for text-to-speech.Â
Source: Read MoreÂ