Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

What if you received a raw transcript that looked like this?

if you picture a sound meter with a needle that bounces up and down 
every time there's a sound the tone is supposed to put the needle 
perfectly at this one spot on the meter with a black numbers end and 
the red part of the meter begins there's like a zero at that spot 
marking this is where you want to be and the tone is just supposed to 
rest there rock solid but this particular day with this particular 
recording we put it on and keith and i watched the meter as the needle 
first dipped below the zero then climbed above the zero and then 
floated sort of tentatively to the spot that it was supposed to be at 
the zero and rested there

Itâ€™s legible but takes quite a bit of effort to read as your mind naturally wants to add punctuation, casing, line breaks, etc. to make sense of the long string of text.

Compare the transcript above to this:

If you picture a sound meter with a needle that bounces up and down 
every time there's a sound, the tone is supposed to put the needle 
perfectly at this one spot on the meter with a black numbers end, and 
the red part of the meter begins there's like a zero at that spot 
marking, this is where you want to be. And the tone is just supposed to 
rest there rock solid. But this particular day, with this particular 
recording, we put it on, and Keith and I watched the meter as the 
needle first dipped below the zero, then climbed above the zero, and 
then floated sort of tentatively to the spot that it was supposed to be 
at the zero and rested there.

See how much easier it is to read? This is because common punctuation and casing have been automatically applied to the transcription text.

Speech-to-text automatic punctuation and casing at AssemblyAI

When you transcribe an audio or video file with the AssemblyAI Speech-to-Text API, your transcript is automatically passed through our Automatic Punctuation and Casing Model.

Instead of a long chunk of text, your transcript has appropriately placed punctuation, such as commas, periods, and question marks, and correctly capitalized proper nouns, acronyms, and more. This helps ease readability and increases the overall usefulness of your transcript, especially for customer-facing use cases.

What is automatic punctuation and casing for speech-to-text?

Punctuation refers to any commas, periods, question marks, exclamation marks, etc. that must be added to a transcription text.

Casing refers to two different categories:

Proper Nouns
Special Scenarios, e.g., acronyms like NASA or NY Times.

What is Inverse Text Normalization (ITN)?

Inverse Text Normalization, or ITN, is a rule-based system (based on a FST, or Finite State Transducer) that also increases the readability of a transcript.

Essentially, ITN translates the spoken form of text (which is the output of the speech-to-text model) into its written form. For example, the raw transcript might output:

february fourth twenty twenty two (spoken form)

The ITN model converts this to:

february 4th 2022 (written form)

ITN is helpful to ensure the proper written format of text such as emails, credit card numbers, social security numbers, dates, and more.

If downstream tasks depend on these inputs, it becomes essential that all dates, numbers, emails, phone numbers, etc. are accurately formatted, or you risk an entire workflow failing to initiate correctly.

Speech-to-text automatic punctuation and casing â€” improvements in Universal-2

Our latest next-generation speech-to-text modelâ€”Universal-2â€”demonstrates even greater improvements in correctly applying text formatting rules like automatic punctuation and casing.

For example, benchmark tests revealed a 15% improvement in transcript structure and 24% improvement in proper noun recognition, leading to more natural-sounding, accurate transcripts for customer-facing products.

Using automatic punctuation with transcripts with the AssemblyAI speech-to-text API

As stated above, the AssemblyAI Speech-to-Text API will automatically punctuate and apply properly cased proper nouns to the transcription text. Numbers will also automatically be converted to their written format.

While automatic punctuation is enabled by default for optimal speech-to-text results, you have the flexibility to disable these features by setting the punctuate and format_text parameters to false in the transcription config. More details can also be found in the AssemblyAI docs.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

Speech-to-text automatic punctuation and casing at AssemblyAI

What is automatic punctuation and casing for speech-to-text?

What is Inverse Text Normalization (ITN)?

Speech-to-text automatic punctuation and casing â€” improvements in Universal-2

Using automatic punctuation with transcripts with the AssemblyAI speech-to-text API

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-48187 – RAGFlow Authentication Bypass

Pixtral 12B is now available on Amazon SageMaker JumpStart

On the Ground at Frostapalooza

Open Contracts: The Free and Open Source Document Analytics Platform

This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

New SLAP & FLOP Attacks Expose Apple M-Series Chips to Speculative Execution Exploits

Kit â€“ lightweight, modular framework for scalable web development

Central Bank Argentina Data Breach: Hackers Allegedly Offer Customer Info for Sale

Proof That Aliens Exist Beneath the Ocean May Come Out Shocking!

Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

Speech-to-text automatic punctuation and casing at AssemblyAI

What is automatic punctuation and casing for speech-to-text?

What is Inverse Text Normalization (ITN)?

Speech-to-text automatic punctuation and casing â€” improvements in Universal-2

Using automatic punctuation with transcripts with the AssemblyAI speech-to-text API

Related Posts