Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

    Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

    November 19, 2024

    Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

    What if you received a raw transcript that looked like this?

    if you picture a sound meter with a needle that bounces up and down 
    every time there's a sound the tone is supposed to put the needle 
    perfectly at this one spot on the meter with a black numbers end and 
    the red part of the meter begins there's like a zero at that spot 
    marking this is where you want to be and the tone is just supposed to 
    rest there rock solid but this particular day with this particular 
    recording we put it on and keith and i watched the meter as the needle 
    first dipped below the zero then climbed above the zero and then 
    floated sort of tentatively to the spot that it was supposed to be at 
    the zero and rested there

    It’s legible but takes quite a bit of effort to read as your mind naturally wants to add punctuation, casing, line breaks, etc. to make sense of the long string of text.

    Compare the transcript above to this:

    If you picture a sound meter with a needle that bounces up and down 
    every time there's a sound, the tone is supposed to put the needle 
    perfectly at this one spot on the meter with a black numbers end, and 
    the red part of the meter begins there's like a zero at that spot 
    marking, this is where you want to be. And the tone is just supposed to 
    rest there rock solid. But this particular day, with this particular 
    recording, we put it on, and Keith and I watched the meter as the 
    needle first dipped below the zero, then climbed above the zero, and 
    then floated sort of tentatively to the spot that it was supposed to be 
    at the zero and rested there.

    See how much easier it is to read? This is because common punctuation and casing have been automatically applied to the transcription text.

    Speech-to-text automatic punctuation and casing at AssemblyAI

    When you transcribe an audio or video file with the AssemblyAI Speech-to-Text API, your transcript is automatically passed through our Automatic Punctuation and Casing Model.

    Instead of a long chunk of text, your transcript has appropriately placed punctuation, such as commas, periods, and question marks, and correctly capitalized proper nouns, acronyms, and more. This helps ease readability and increases the overall usefulness of your transcript, especially for customer-facing use cases.

    What is automatic punctuation and casing for speech-to-text?

    Punctuation refers to any commas, periods, question marks, exclamation marks, etc. that must be added to a transcription text.

    Casing refers to two different categories:

    1. Proper Nouns
    2. Special Scenarios, e.g., acronyms like NASA or NY Times.

    What is Inverse Text Normalization (ITN)?

    Inverse Text Normalization, or ITN, is a rule-based system (based on a FST, or Finite State Transducer) that also increases the readability of a transcript.

    Essentially, ITN translates the spoken form of text (which is the output of the speech-to-text model) into its written form. For example, the raw transcript might output:

    february fourth twenty twenty two (spoken form)

    The ITN model converts this to:

    february 4th 2022 (written form)

    ITN is helpful to ensure the proper written format of text such as emails, credit card numbers, social security numbers, dates, and more.

    If downstream tasks depend on these inputs, it becomes essential that all dates, numbers, emails, phone numbers, etc. are accurately formatted, or you risk an entire workflow failing to initiate correctly.

    Speech-to-text automatic punctuation and casing — improvements in Universal-2

    Our latest next-generation speech-to-text model—Universal-2—demonstrates even greater improvements in correctly applying text formatting rules like automatic punctuation and casing.

    For example, benchmark tests revealed a 15% improvement in transcript structure and 24% improvement in proper noun recognition, leading to more natural-sounding, accurate transcripts for customer-facing products.

    Want to Test Our Punctuation and Casing and ITN Models?

    Try AssemblyAI’s Speech-to-Text APIs for free

    Try for Free

    Using automatic punctuation with transcripts with the AssemblyAI speech-to-text API

    As stated above, the AssemblyAI Speech-to-Text API will automatically punctuate and apply properly cased proper nouns to the transcription text. Numbers will also automatically be converted to their written format.

    While automatic punctuation is enabled by default for optimal speech-to-text results, you have the flexibility to disable these features by setting the punctuate and format_text parameters to false in the transcription config. More details can also be found in the AssemblyAI docs.

    Want to Test Our Punctuation and Casing and ITN Models?

    Try AssemblyAI’s Speech-to-Text APIs for free

    Try Them Now

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop 9 Amazon Textract alternatives for data extraction
    Next Article This Nintendo Switch bundle is just $360 at Amazon ahead of Black Friday

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48187 – RAGFlow Authentication Bypass

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Pixtral 12B is now available on Amazon SageMaker JumpStart

    Development

    On the Ground at Frostapalooza

    Development

    Open Contracts: The Free and Open Source Document Analytics Platform

    Development

    This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

    Development

    Highlights

    Development

    New SLAP & FLOP Attacks Expose Apple M-Series Chips to Speculative Execution Exploits

    January 29, 2025

    A team of security researchers from Georgia Institute of Technology and Ruhr University Bochum has…

    Kit – lightweight, modular framework for scalable web development

    June 17, 2024

    Central Bank Argentina Data Breach: Hackers Allegedly Offer Customer Info for Sale

    April 29, 2024

    Proof That Aliens Exist Beneath the Ocean May Come Out Shocking!

    April 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.