Get started using Claude 3.5 Sonnet with audio data

Claude 3.5 Sonnet, recently announced by Anthropic, sets new industry benchmarks for many LLM tasks. It excels in tasks ranging from complex coding to nuanced literary analysis, showcasing exceptional context awareness and creativity.

In this tutorial, you’ll learn how to use Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku with audio or video files in Python.

Pipeline for applying Claude 3 models to audio data

Here are a few example use cases you can use this pipeline for:

Creating summaries of long podcasts or YouTube videosAsking any questions about the audio contentGenerating action items from meetings

How does it work?

Since language models only work with text data, you first have to transcribe the audio data. Multimodal models can overcome this, but they are still in the early stages of development.

To achieve this, we use LeMUR, AssemblyAI’s framework for applying LLMs to speech data. With LeMUR, you don’t need to combine several different services, and can easily combine industry-leading Speech AI models and LLMs in just a few lines of code.

This is made possible through a collaboration between AssemblyAI and Anthropic. You can access all Claude 3 models through the AssemblyAI platform at no additional cost.

Set up the SDK

To get started, install the AssemblyAI Python SDK, which includes all LeMUR functionality.

pip install assemblyai

Then, import the package and set your API key. You can get one for free here.

import assemblyai as aai
aai.settings.api_key = “YOUR_API_KEY”

ðŸ’¡

Want to try out the code immediately? Use this Google Colab.

Transcribe an audio or video file

Next, transcribe an audio or video file by setting up a Transcriber and calling the transcribe() function. You can pass in any local file or publicly accessible URL.

Here we use a podcast episode of Lenny’s podcast featuring Dalton Caldwell from Y Combinator.

audio_url = “https://storage.googleapis.com/aai-web-samples/lennyspodcast-daltoncaldwell-ycstartups.m4a”

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url)

print(transcript.text)Seeing everything people apply to YC, with people all kind of have the same idea…

Use Claude 3.5 Sonnet with audio data

Claude 3.5 Sonnet is Anthropic’s most intelligent model to date, outperforming Claude 3 Opus on a wide range of evaluations while remaining cheaper.

To use Sonnet 3.5, call transcript.lemur.task(), a flexible endpoint that allows you to specify any prompt. It automatically adds the transcript as additional context for the model.

To use 3.5 Sonnet, specify aai.LemurModel.claude3_5_sonnet for the model when calling the LLM. Here’s an example of a simple summarization prompt:

prompt = “Provide a brief summary of the transcript.”

result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)Here’s a brief summary of the transcript:

The transcript covers two main topics:

1. Advice for startup founders:
Dalton and Lenny discuss the importance of giving simple, pragmatic advice to founders. They talk about perseverance, knowing when to pivot or give up, and avoiding “tar pit ideas” that seem appealing but are consistently unsuccessful.

2. Dalton’s early experiences in Silicon Valley:
Dalton shares his experiences from the early 2000s, including interactions with notable figures like Reid Hoffman, Sam Altman, and Sean Parker before they became famous. He discusses his own startup journey, including selling a company to MySpace and starting Pick Please, which competed with Instagram in the photo-sharing space.

The conversation provides insights into the startup world, both from an advisory perspective and through personal anecdotes from Silicon Valley’s early days.

Use Claude 3 Opus with audio data

Claude 3 Opus is good at handling complex analysis, longer tasks with many steps, and higher-order math and coding tasks.

To use Opus, specify aai.LemurModel.claude3_opus for the model when calling the LLM. Here’s an example of a prompt to extract certain information from the transcript:

prompt = “Extract all advice Dalton gives in this podcast episode. Use bullet points.”

result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_opus
)

print(result.response)Based on the transcript summaries, here are the main pieces of advice Dalton gives:

– Give founders simple, pragmatic advice like “sell shit, make money” and “don’t die.” Even elite athletes need reminders of fundamentals from their coaches.
– When advising struggling founders, consider if it is still fun for them and if the problem is fixable. Persevering founders often truly love their customers and products.
– Discourage founders from continuing solely to avoid failure.
– When discussing pivots, emphasize moving closer to the founder’s personal expertise and experience.
– Be aware of “tar pit ideas” that attract many founders but are consistently unsuccessful, like social coordination apps.
– Signs it may be time for founders to consider giving up include being out of growth ideas or disliking the work. However, most founders feel hopeless at some point but continue through willpower alone. Numerous success stories nearly failed but the founders refused to accept it.

Use Claude 3 Haiku with audio data

Claude 3 Haiku is the fastest and cheapest model, great for executing lightweight actions.

To use Haiku, specify aai.LemurModel.claude3_haiku for the model when calling the LLM. Here’s an example of a simple prompt to ask your questions:

prompt = “What are tar pit ideas?”

result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_haiku
)

print(result.response)Based on the transcript summary, “tar pit ideas” refer to ideas that attract many founders but are consistently unsuccessful. The example given is social coordination apps, which the summary states “seem appealing but people have worked on them for decades without success.”

The key points about “tar pit ideas” from the summary are:

1. They are ideas that seem appealing to many founders, but are consistently unsuccessful over time.
2. The example provided is social coordination apps, which have been worked on for decades without success.
3. The implication is that these types of ideas, despite their initial appeal, end up being traps or “tar pits” that founders get stuck in without achieving success.

So in essence, “tar pit ideas” are concepts or business ideas that appear promising on the surface but have proven to be very difficult to execute successfully over the long term, trapping founders who pursue them.

Learn more about prompt engineering

And that’s how easily you can apply Claude 3 models to audio data with AssemblyAI and the LeMUR framework! I hope you enjoyed the quick guide! To get the most out of LeMUR and the Claude 3 models, see the following resources:

LeMUR prompt engineering guide Anthropic prompt engineering docs Cookbook for advanced structured Q&A Cookbook for advanced customized summaries

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

I saw every Samsung QLED TV releasing in 2025 – these standout features had me hooked

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

6 reasons why I think Microsoft should keep the ‘local account’ option in Windows 11

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Feature Flags with Laravel Pennant

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

Get started using Claude 3.5 Sonnet with audio data

How does it work?

Set up the SDK

Transcribe an audio or video file

Use Claude 3.5 Sonnet with audio data

Use Claude 3 Opus with audio data

Use Claude 3 Haiku with audio data

Learn more about prompt engineering

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

Mindset Teleportation: How Legend Srinidhi Ranganathan (The “Human AI”) Leverages Extreme Hyperphantasia to Revolutionize Creative Thinking?

Microsoft Edge’s Bing now hides Google Chrome download link on Windows 11

Technical Services Tools: Embracing Modern Frameworks and Influencing Efficiency

Is it possible to use Microsoft UIA to automate sites designed in Angular?

Fine-tune Anthropicâ€™s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality

11 Versatile Use Cases of Metaâ€™s Segment Anything Model 2 (SAM 2)

Is your internet being throttled? Here’s how to find out (and stop it)

Best Free and Open Source Alternatives to Microsoft Windows Clock

Get started using Claude 3.5 Sonnet with audio data

How does it work?

Set up the SDK

Transcribe an audio or video file

Use Claude 3.5 Sonnet with audio data

Use Claude 3 Opus with audio data

Use Claude 3 Haiku with audio data

Learn more about prompt engineering

Related Posts