Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Content moderation on audio files with Python

    Content moderation on audio files with Python

    May 27, 2024

    With a growing percentage of human communication happening online, ensuring that this communication is content-appropriate for a given platform is critical to maintaining the integrity and safety of online spaces. Content moderation is essential in these efforts, helping to detect and manage inappropriate or sensitive material in media files.

    In this tutorial, we’ll learn how you can use Python and state-of-the-art AI models to automatically perform content moderation on audio files at scale with just a few lines of code. 

    We’ll use this example file:

    Canadian Wildfires
    0:00
    /281.051375

    Below is an excerpt from the output, showing a section of the file that discusses the sensitive topic of health issues, along with a severity score for the content and a confidence estimate in the prediction. Additionally, the timestamps for the relevant section are displayed.:

    So what is it in this haze that makes it harmful? And I’m assuming it is harmful. It is. It is. The levels outside right now in Baltimore are considered unhealthy. And most of that is due to what’s called particulate matter, which are tiny particles, microscopic, smaller than the width of your hair, that can get into your lungs and impact your respiratory system, your cardiovascular system, and even your neurological, your brain. What makes this particularly harmful?
    Timestamp: 56.3s – 85.4s
    Label: health_issues – Confidence: 94% – Severity: 88%

    Step 1: Set up your environment

    Before we start coding, we’ll need to ensure your environment is properly configured. First, ensure Python is installed on your computer. You can download and install Python from the official Python website if it isn’t already installed.

    Next, install the assemblyai Python package, which allows us to submit files to AssemblyAI for rapid content moderation. Install the package with pip by running the following command in your terminal or command prompt:

    pip install assemblyai

    After installing the assemblyai package, you’ll need to set your API key as an environment variable. Your AssemblyAI API key is a unique identifier that allows you access to AssemblyAI’s AI models. You can get an API key for free here, or copy it from your dashboard if you already have one.

    Once you’ve copied your API key, set it as an environment variable – for Mac and Linux users, use the terminal to run:

    export ASSEMBLYAI_API_KEY=YOUR_KEY_HERE

    For Windows users, use the Command Prompt to execute:

    set ASSEMBLYAI_API_KEY=YOUR_KEY_HERE

    Step 2: Transcribe the file with content moderation

    Now that your environment is set up, the next step is to transcribe your audio file and apply content moderation, allowing you to detect potentially sensitive or inappropriate content within the file.

    First, create a file called main.py, import the assemblyai package, and specify the location of the audio file you would like to use. This location can be either a local file path or a publicly-accessible download URL. If you don’t want to use your own file, you can keep the default example specified below:

    import assemblyai as aai

    audio_url = “https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3”

    Before we transcribe the audio file, we need to specify the configuration for the transcription. Create an aai.TranscriptionConfig object and enable content moderation via content_safety=True. This setting instructs AssemblyAI to analyze the audio for any content that may be considered sensitive during the transcription. You can check out the AssemblyAI docs to see other available models you can enable through the TranscriptionConfig. Add the following line to main.py:

    config = aai.TranscriptionConfig(content_safety=True)

    Next, pass this config into an aai.Transcriber object, and then pass the audio file into the Transcriber’s transcribe method. This submits the audio file for transcription according to the settings defined in the TranscriptionConfig. Add the following lines to main.py:

    transcriber = aai.Transcriber(config=config)

    transcript = transcriber.transcribe(audio_url)

    The resulting transcript is an aai.Transcript object which contains, among other information, the information about any potentially sensitive segments in the file. Let’s take a look at what’s returned now.

    Step 3: Print the result

    After transcribing the audio file and analyzing it for sensitive content, we can print the Content Moderation results to see what information is returned. You can then include some logic in your application to automatically handle sensitive content according to your content policies.

    All of the content moderation information for the transcript is found in the transcript.content_safety object. The results attribute of this object contains a list of objects, one for each section in the audio file that the Content Moderation model flagged as sensitive content.

    Below we iterate through each element in this list and print off the text for the corresponding section in the file, as well as the timestamps for the beginning and end of the section. Then, we print information for all of the content moderation labels assigned to the section. Each label specifies a different type of sensitive content that was detected in the given section, along with a confidence score and a severity rating for that type of content in that particular section. Add the following lines to main.py:

    # Get the parts of the transcript which were flagged as sensitive.
    for result in transcript.content_safety.results:
    print(result.text)
    print(f”Timestamp: {result.timestamp.start/1000:.1f}s – {result.timestamp.end/1000:.1f}s”)

    # Get category, confidence, and severity.
    for label in result.labels:
    print(f”Label: {label.label} – Confidence: {label.confidence*100:.0f}% – Severity: {label.severity*100:.0f}%”) # content safety category
    print()

    Here is one of the items that will be output when we run the script:

    Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what’s happening here and why. So he called Peter DiCarlo, an associate professor in the department of Environmental Health and Engineering at Johns Hopkins University. Good morning. Professor. Good morning.
    Timestamp: 0.2s – 28.8s
    Label: disasters – Confidence: 81% – Severity: 39%

    We can see that this section was identified, with 81% accuracy, to touch on the sensitive topic of disasters with a 39% severity.

    You can use these results to identify sections of audio that are considered sensitive according to some internal criterion. In the code block below, we’ve added the criterion that the multiplicative product of the severity and confidence of a sensitive section must meet a certain threshold to be reported, with the intent of reporting only sections that are at least one of reasonably severe or reasonably confident. 

    THRESHOLD = 0.7

    # Get the parts of the transcript which were flagged as sensitive.
    for result in transcript.content_safety.results:
    if not any([label.confidence*label.severity > THRESHOLD for label in result.labels]):
    continue
    print(result.text)
    print(f” Timestamps: {result.timestamp.start/1000:0.1f}s – {result.timestamp.end/1000:0.1f}s”)

    # Get category, confidence, and severity.
    for label in result.labels:
    if label.confidence*label.severity > THRESHOLD:
    print(f” Label: {label.label} (Confidence: {label.confidence:.02f}, Severity: {label.severity:.02f})”) # content safety category
    print()

    Here is the full output of this code block:

    So what is it in this haze that makes it harmful? And I’m assuming it is harmful. It is. It is. The levels outside right now in Baltimore are considered unhealthy. And most of that is due to what’s called particulate matter, which are tiny particles, microscopic, smaller than the width of your hair, that can get into your lungs and impact your respiratory system, your cardiovascular system, and even your neurological, your brain. What makes this particularly harmful?
        Timestamps: 56.3s – 85.4s
        Label: health_issues (Confidence: 0.94, Severity: 0.88)

    Summarizing overall findings

    Furthermore, you can summarize the overall findings of the Content Moderation model to get a broader view of the audio content’s nature. Add the following lines to main.py:

    # Get the confidence of the most common labels in relation to the entire audio file.
    for label, confidence in transcript.content_safety.summary.items():
    print(f”{confidence * 100:.2f}% confident that the audio contains {label}”)

    print()

    When you run the script, you will see this output, which indicates that the Content Moderation is highly confident that this audio file as a whole concerns disasters and health issues.

    98.88% confident that the audio contains disasters
    90.83% confident that the audio contains health_issues

    Additionally, you can get a finer-grained breakdown of these issues by accessing a severity score summary. This breakdown similarly considers the audio file as a whole, but this time giving a probability across low, medium, and high severities for each of the labels that describe the file as a whole. In other words, for each sensitive topic that applies to the audio file as a whole, this breakdown gives a distribution across 3 discrete severity levels for each sensitive topic.

    # Get the overall severity of the most common labels in relation to the entire audio file.
    for label, severity_confidence in transcript.content_safety.severity_score_summary.items():
    print(f”{severity_confidence.low * 100:.2f}% confident that the audio contains low-severity {label}”)
    print(f”{severity_confidence.medium * 100:.2f}% confident that the audio contains medium-severity {label}”)
    print(f”{severity_confidence.high * 100:.2f}% confident that the audio contains high-severity {label}”)

    Here is the output:

    53.14% confident that the audio contains low-severity disasters
    46.86% confident that the audio contains medium-severity disasters
    0.00% confident that the audio contains high-severity disasters
    20.70% confident that the audio contains low-severity health_issues
    46.23% confident that the audio contains medium-severity health_issues
    33.07% confident that the audio contains high-severity health_issu

    We can see that, for each distinct label that applies to the entire file, the probabilities for low, medium, and high sum to 100%.

    Run python main.py in the terminal in which you set your AssemblyAI API key as an environment variable to see all of these outputs printed to the console.

    Final words

    In this tutorial, you learned how to perform a Content Moderation analysis of an audio file using AI. With the results printed and analyzed, you can make informed decisions to ensure your audio content aligns with your organization’s safety and content standards.

    If you want to learn more about how to analyze audio and video files with AI, check out more of our blog, like this article on filtering profanity from audio files with Python. Alternatively, feel free to check out our YouTube channel for educational videos on AI and AI-adjacent projects, like this video on how to automatically extract phone call insights using LLMs and Python:

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLuke’s Larabits: How to Perform Random Order Pagination
    Next Article The Best Email Parser in 2024

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    A guide to landing your first product design role

    Development

    How to Bring Zero Trust to Wi-Fi Security with a Cloud-based Captive Portal?

    Development

    The top 25 weaknesses in software in 2024

    Development

    Dad Can’t Draw

    Web Development

    Highlights

    Development

    Tinygrad: A Simplified Deep Learning Framework for Hardware Experimentation

    August 22, 2024

    One of the biggest challenges when developing deep learning models is ensuring they run efficiently…

    Law Enforcement Takes Down Botnet Made Up of Thousands of End-Of-Life Routers

    May 9, 2025

    Sending PDF Document via REST API in Salesforce: A Beginner’s Guide

    January 20, 2025

    Microsoft tries again with aggressive tactics against Google and Chrome, touting Edge’s “world-class performance” with a pop-up ad promo

    February 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.