Announcing New Language Support for PII Text Redaction and Expanding Entity Detection

At Assembly, we’re focused on ensuring that you can extract maximum value and insights from voice data, while also keeping privacy and security at the forefront to keep you and your end users safe. Today, we’re announcing updates to our PII Text Redaction and Entity Detection features to give you more power and control in protecting sensitive information.

What’s New

PII Text Redaction now available in 47 additional languages16 new entity types added to Entity Detection for a total of 44 types available

PII Redaction: Safeguarding Sensitive Information Across Languages

Our latest update brings expanded language support to our PII Text Redaction feature, now available in 47 additional languages.Â

This enhancement ensures that your Personally Identifiable Information (PII) â€” any information that can be used to identify a person â€” is safeguarded regardless of location or language, making robust privacy measures more accessible.

With this feature, you can:

Securely handle customer service calls containing personal informationSafely share and analyze user-generated content in media applicationsProtect participant privacy in market research studies and surveys

With PII Redaction, you can identify and remove personal data such as addresses, phone numbers, and credit card details from your transcripts. You can accomplish this in two ways:

Text Redaction: Generates a transcript with PII removed. Example:Â “You can reach me at [phone_number]” or “You can reach me at ###.â€Audio Redaction: “Beeps out” sensitive information in your audio file.

With a few extra lines of code, you can quickly redact PII from your transcripts.

import assemblyai as aai

aai.settings.api_key = “YOUR API KEY”

audio_url = “https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3”

config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii(
policies=[
aai.PIIRedactionPolicy.person_name,
aai.PIIRedactionPolicy.organization,
aai.PIIRedactionPolicy.occupation,
],
substitution=aai.PIISubstitutionPolicy.hash,
)

transcript = aai.Transcriber().transcribe(audio_url, config)

for utterance in transcript.utterances:
print(f”Speaker {utterance.speaker}: {utterance.text}”)

print(transcript.text)

Example output

Speaker A: Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what’s happening here and why. So we called ##### #######, an ######### ######### in the ########## ## ############# ###### ### ########### at ##### ####### ##########. Good morning, #########.

Speaker B: Good morning.

Speaker A: So what is it about the conditions right now that have caused this round of wildfires to affect so many people so far away?

Speaker B: Well, there’s a couple of things. The season has been pretty dry already, and then the fact that we’re getting hit in the US is because there’s a couple weather systems that…

Plus, the PII models achieve 99%+ precision, accuracy, and recall in major languages, including English, French, German, Italian, Portuguese, Spanish, Korean, Hindi, Russian, Tagalog, and Ukrainian.Â

This ensures reliable protection of sensitive information. For EU-based operations, we support PII Text Redaction in 13 languages, meeting regional data residency requirements.

Expanding Entity Detection

Extracting meaningful insights from large volumes of audio data can be time-consuming and resource-intensive. We enhanced our Entity Detection with 16 new entity types for a total of 44 different entities so you can extract more value from your audio data.

You can automatically identify and categorize key information in your transcripts, providingÂ detailed entity lists and timestamps.Â

Here are a few examples of what you can detect:

Names of peopleOrganizationsAddressesPhone numbersMedical dataSocial security numbers

Here’s how you can enable Entity Detection within your app:

import assemblyai as aai

aai.settings.api_key = “YOUR API KEY”

audio_url = “https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3”

config = aai.TranscriptionConfig(entity_detection=True)

transcript = aai.Transcriber().transcribe(audio_url, config)

for entity in transcript.entities:
print(entity.text)
print(entity.entity_type)
print(f”Timestamp: {entity.start} – {entity.end}n”)

You’ll receive precise entity data, such as:

Canada
location
Timestamp: 2548 – 3130

the US
location
Timestamp: 5498 – 6350

…

By eliminating manual review, you can efficiently search and categorize audio content across multiple languages and markets. This opens up new possibilities for in-depth analysis and data-driven decision making.

With Entity Detection, you’ll be able to run key use cases like:

Rapidly analyze call center interactions to improve customer serviceCategorize and search media content more effectivelyExtract key trends and patterns from market research data

Entity Detection delivers reliable results with 99% accuracy in major languages. It also supports EU data residency for 13 languages, helping you maintain regional compliance requirements. Quickly unlock the full potential of your audio data, gaining deeper insights into customer interactions and market trends across a broad range of contexts.

Frequently Asked Questions

I want to store and process my data in the EU. Will the expanded PII Text Redaction and Entity Detection languages be supported by EU Data Residency?

Yes, 13 languages in our “Best ASR” offering will be supported by EU Data Residency: English, Finnish, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.

What is the quality of PII Text Redaction and Entity Detection across languages?

The highest quality PII Text Redaction and Entity Detection is found in English, French, German, Italian, Portuguese, Spanish, Korean, Hindi, Dutch, Japanese, Mandarin, Russian, Tagalog, and Ukrainian. These languages have fully trained corpora with verified 99%+ precision, accuracy, and recall results.

PII Redaction and Entity Detection in other supported languages perform well, but the training corpus is actively being improved to bring it up to the same level as the other languages.

How secure is my data when using AssemblyAI’s PII Redaction and Entity Detection?

AssemblyAI prioritizes data security with enterprise-grade encryption both in transit and at rest. We adhere to stringent data security practices to ensure your sensitive information is protected. Additionally, users can request the deletion of their data at any time, and these requests are handled promptly.

Start building with new security features today.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Announcing New Language Support for PII Text Redaction and Expanding Entity Detection

What’s New

PII Redaction: Safeguarding Sensitive Information Across Languages

Example output

Expanding Entity Detection

Frequently Asked Questions

I want to store and process my data in the EU. Will the expanded PII Text Redaction and Entity Detection languages be supported by EU Data Residency?

What is the quality of PII Text Redaction and Entity Detection across languages?

How secure is my data when using AssemblyAI’s PII Redaction and Entity Detection?

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

Optimizing Factory Data Creation with Laravel’s recycle Method

SPRITE (Spatial Propagation and Reinforcement of Imputed Transcript Expression): Enhancing Spatial Gene Expression Predictions and Downstream Analyses Through Meta-Algorithmic Integration

WARNING: Expiring Root Certificate May Disable Firefox Add-Ons, Security Features, and DRM Playback

What killed innovation?

Rajeev’s Viral Success: How Opportunities Today Took His Business to the Next Level?

Quantum Framework (QFw): A Flexible Framework for Hybrid HPC and Quantum Computing

Tap into Your PHP Potential with Free Projects at PHPGurukul

Use the AWS InfluxDB migration script to migrate your InfluxDB OSS 2.x data to Amazon Timestream for InfluxDB

Announcing New Language Support for PII Text Redaction and Expanding Entity Detection

What’s New

PII Redaction: Safeguarding Sensitive Information Across Languages

Example output

Expanding Entity Detection

Frequently Asked Questions

I want to store and process my data in the EU. Will the expanded PII Text Redaction and Entity Detection languages be supported by EU Data Residency?

What is the quality of PII Text Redaction and Entity Detection across languages?

How secure is my data when using AssemblyAI’s PII Redaction and Entity Detection?

Related Posts