AI transcription tools generate harmful hallucinations

Speech-to-text transcribers have become invaluable but a new study shows that when the AI gets it wrong the hallucinated text is often harmful.

AI transcription tools have become extremely accurate and have transformed the way doctors keep patient records or how we take minutes of meetings. We know theyâ€™re not perfect so weâ€™re unsurprised when the transcription isnâ€™t quite right.

A new study found that when more advanced AI transcribers like OpenAIâ€™s Whisper make mistakes they donâ€™t simply produce garbled or random text. They hallucinate entire phrases, and they are often distressing.

We know that all AI models hallucinate. When ChatGPT doesnâ€™t know an answer to a question, it will often make something up instead of saying â€œI donâ€™t know.â€

Researchers from Cornell University, the University of Washington, New York University, and the University of Virginia found that even though the Whisper API was better than other tools, it still hallucinated just over 1% of the time.

The more significant finding is that when they analyzed the hallucinated text, they found that â€œ38% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority.â€

It seems that Whisper doesnâ€™t like awkward silences, so when there were longer pauses in the speech it tended to hallucinate more to fill the gaps.

This becomes a serious problem when transcribing speech spoken by people with aphasia, a speech disorder that often causes the person to struggle to find the right words.

Careless Whisper

The paper records the results from experiments with early 2023 versions of Whisper. OpenAI has since improved the tool but Whisperâ€™s tendency to go to the dark side when hallucinating is interesting.

The researchers classified the harmful hallucinations as follows:

Perpetuation of Violence: Hallucinations that depicted violence, made sexual innuendos, or involved demographic stereotyping.
Inaccurate Associations: hallucinations that introduced false information, such as incorrect names, fictional relationships, or erroneous health statuses.
False Authority: These hallucinations included text that impersonated authoritative figures or media, such as YouTubers or newscasters, and often involved directives that could lead to phishing attacks or other forms of deception.

Here are some examples of transcriptions where the words in bold are Whisperâ€™s hallucinated additions.

Whisperâ€™s hallucinated additions to the transcription are shown in bold. Source: arXiv
Whisperâ€™s hallucinated additions to the transcription are shown in bold. Source: arXiv

You can imagine how dangerous these kinds of mistakes could be if the transcriptions are assumed to be accurate when documenting a witness statement, a phone call, or a patientâ€™s medical records.

Why did Whisper take a sentence about a fireman rescuing a cat and add a â€œblood-soaked strollerâ€ to the scene, or add a â€œterror knifeâ€ to a sentence describing someone opening an umbrella?

OpenAI seems to have fixed the problem but hasnâ€™t given an explanation for why Whisper behaved the way it did. When the researchers tested the newer versions of Whisper they got far fewer problematic hallucinations.

The implications of even slight or very few hallucinations in transcriptions could be serious.

The paper described a real-world scenario where a tool like Whisper is used to transcribe video interviews of job applicants. The transcriptions are fed into a hiring system that uses a language model to analyze the transcription to find the most suitable candidate.

If an interviewee paused a little too long and Whisper added â€œterror knifeâ€, â€œblood-soaked strollerâ€, or â€œfondledâ€ to a sentence it might affect their odds of getting the job.

The researchers said that OpenAI should make people aware that Whisper hallucinates and that it should find out why it generates problematic transcriptions.

They also suggest that newer versions of Whisper should be designed to better accommodate underserved communities, such as people with aphasia and other speech impediments.

The post AI transcription tools generate harmful hallucinations appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

Microsoft shares rare look at radical Windows 11 Start menu designs it explored before settling on the least interesting one of the bunch

NVIDIA’s new GPU driver adds DOOM: The Dark Ages support and improves DLSS in Microsoft Flight Simulator 2024

How to install and use Ollama to run AI LLMs on your Windows 11 PC

Community News: Latest PECL Releases (05.13.2025)

Community News: Latest PECL Releases (05.13.2025)

How We Use Epic Branches. Without Breaking Our Flow.

I think the ergonomics of generators is growing on me.

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

Microsoft shares rare look at radical Windows 11 Start menu designs it explored before settling on the least interesting one of the bunch

NVIDIA’s new GPU driver adds DOOM: The Dark Ages support and improves DLSS in Microsoft Flight Simulator 2024

AI transcription tools generate harmful hallucinations

Careless Whisper

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2024-52290 – LF Edge eKuiper Cross-Site Scripting (XSS)

Photonic processor could enable ultrafast AI computations with extreme energy efficiency

elementary OS 8.1 Brings Bug Fixes, New Kernel + More

Telemedicine Integration in European Healthcare Systems: Opportunities and Challenges

Windows 11 24H2 to get new features in February – what’s coming

stagen – wlroots-based wayland compositor

Using Sitecore Connect and OpenAI: A Practical Example for Page Metadata Enhancement

ORiGAMi: A Machine Learning Architecture for the Document Model

How time-tracking apps can help you get more done – and my 4 favorite

AI transcription tools generate harmful hallucinations

Careless Whisper

Related Posts