Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Transcribe audio and video files with Python and Universal-1

    Transcribe audio and video files with Python and Universal-1

    April 9, 2024

    Our recently announced speech model Universal-1 sets a new standard for automated speech recognition (ASR) accuracy. Universal-1 demonstrates near-human accuracy, even with accented speech, background noise, and difficult phrases like flight numbers and email addresses. The model now is accessible through the same web API as our previous ASR models. 

    Along with Universal-1, we’ve also introduced two new pricing tiers: Best and Nano. The Best tier of Universal-1 is designed for the highest accuracy possible. Nano is our new cost-effective tier with support for 99 different languages.

    This tutorial will explain how to quickly transcribe audio or video files in Python applications using the Best and Nano tiers with our Speech-to-Text API.

    Install the AssemblyAI Python SDK

    The easiest way to start transcribing audio is by using one of our official SDKs.

    Install the AssemblyAI Python SDK with the following command:

    pip install –upgrade assemblyai

    Sign up for a new account or log into your existing AssemblyAI account to obtain the API key from your account dashboard, as we will need this API key to authorize our API calls in a Python script.

    Transcribe an audio file using Universal-1

    To start transcribing an audio file from a URL using Best tier, create a new file named transcribe.py and import the SDK in your Python code: 

    import assemblyai as aai

    Configure a new authenticated SDK client with the API key found in your account dashboard.

    aai.settings.api_key = “YOUR_API_KEY”
    transcriber = aai.Transcriber()

    By default, all transcriptions use the Best tier, so you’ll always get the highest accuracy without any extra configuration.

    Continue by specifying either an audio or video file URL, or a local file with the following code:

    # you can use an audio file located at a publicly-accessible URL
    audio_file = “https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3”

    # or you can upload a local file directly from your local file system
    audio_file = “/Users/matt/Downloads/5_common_sports_injuries.mp3”

    # “audio_url” variable is set for a remote URL, or “audio_file” for local file
    transcript = transcriber.transcribe(audio_file)

    if transcript.error:
        print(transcript.error)
    else:
        print(transcript.text)

    On the command line, run the script with the following command:

    python transcribe.py

    You should now have the results of the transcription performed by Universal-1 printed to your terminal. Use the code to transcribe audio and video files in your Python applications.

    Nano—a cost-effective alternative

    Switching between Best and Nano requires a tweak to the TranscriptionConfig that can be passed into the Transcriber object. To use Nano, set the speech_model parameter to nano while instantiating the TranscriberConfig object:

    config = aai.TranscriptionConfig(speech_model=”nano”)
    transcriber = aai.Transcriber(config)
    transcript = transcriber.transcribe(audio_url)

    Here is what the completed script with both Best and Nano options looks like:

    import assemblyai as aai

    aai.settings.api_key = “YOUR_API_KEY”
    transcriber = aai.Transcriber()

    # you can use an audio file located at a publicly-accessible URL
    audio_file = “https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3”

    # this code will run the “Best” tier
    transcript = transcriber.transcribe(audio_file)

    if transcript.error:
    print(transcript.error)
    else:
    print(“Best tier output:”)
    print(transcript.text)

    # this is how you can run Nano by setting the speech_model parameter
    config = aai.TranscriptionConfig(speech_model=”nano”)
    transcriber = aai.Transcriber(config=config)
    transcript = transcriber.transcribe(audio_file)

    if transcript.error:
    print(transcript.error)
    else:
    print(“nNano tier output:”)
    print(transcript.text)

    When you run the above script you should see output like the following (note that this output is abbreviated after “37th minute”):

    Best tier output:
    Runner’s knee runner’s knee is a condition characterized by pain behind or around the kneecap. It is caused by overuse, muscle imbalance and inadequate stretching. Symptoms include pain under or around the kneecap, pain when walking sprained ankle one nil here in the 37th minute…

    Nano tier output:
    Runner’s knee runner’s knee is a condition characterized by pain behind or around the kneecap. It is caused by overuse, muscle imbalance and inadequate stretching. Symptoms include pain under or around the kneecap, pain when walking sprained ankle one nil here in the 37th minute…

    Best, Nano and More with Audio Intelligence

    We just used Universal-1 using both the Best and Nano pricing tiers to transcribe audio.

    Next, there are many further features that AssemblyAI offers beyond transcription to explore, such as:

    Entity detection to automatically identify and categorize key information.Content moderation for detecting inappropriate content in audio files to ensure that your content is safe for all audiences.PII redaction to minimize sensitive information about individuals by automatically identifying and removing it from your transcript.LeMUR for applying Large Language Models (LLMs) to audio data in a single line of code.

    You can also learn more about our approach to creating superhuman Speech AI models on our Research page.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAnthropic shows that Claude LLMs have become exceptionally persuasive
    Next Article Claude vs ChatGPT: A Comparison of AI Chatbots

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Atlas Stream Processing is Now Generally Available!

    Databases

    Microsoft to Close Its UK Flagship Store in London Next Month

    Operating Systems

    Case Study: Motoyoshi Takamitsu

    News & Updates

    Micro Agent: An AI Agent that Writes and Fixes Code for You

    Development

    Highlights

    Databases

    AWS DMS implementation guide: Building resilient database migrations through testing, monitoring, and SOPs

    May 5, 2025

    AWS Database Migration Service (AWS DMS) simplifies database migration and replication, offering a managed solution…

    Fine-Tuning LLaMA 70B Using Hugging Face Accelerate & DeepSpeed on Multiple Nodes 

    March 31, 2025

    ChatGPT has officially replaced Google Search for me – here’s why

    November 7, 2024

    Unlock Your Creativity with Google Web Designer

    May 10, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.