Researchers say AI-powered transcription tool used in hospitals invents things no one ever said

Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human level robustness and accuracy.” But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in “high-risk domains.” The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work.

A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisp.