Speech recognition technology displayed impressive improvements in the past few years. In fact, Microsoft announced last year that their conversational speech recognition system has reached the lowest error rate of 5.1%. The said technology led to the rise of many automated transcription services and slowly decreased the popularity of human transcriptions. However, is automated transcription really worth the hype?
According to AudioTranscription.org, many modern automatic transcription programs still have error rates as high as 40%, making them almost useless for general purposes. Here are the most common errors and key reasons why automated transcription still requires a human touch:
- The accuracy of automated transcripts heavily depends on the quality of recordings. Static and ambient noises easily drown out dialogues, reflecting as mishears and missing words or phrases in the completed transcript. The speaker’s accent, emphasis and speed of speech also play great roles in transcript accuracy.
- Speech-to-text programs do not have interpretive capabilities to determine technical jargons or specialized terms used by different professions. For instance, in medical transcriptions, homophones (or words that sounds alike but have different meanings) may be used interchangeably, like “claustrum” and “colostrum”. Some terms may even be transcribed into something completely out of context, like “euthanasia” into “youth in asia”.
- The capabilities of an automated transcription software is no doubt limited – it cannot pick up the flow of dialogues and subtle differences of tone. Meaning, punctuation errors are immensely common with automated transcription. Speech-to-text technology is not at that level yet, where they can understand the differences of period, comma, semicolon and other punctuations. Automated transcription services are aware of this insufficiency, which is why some providers have chosen to not offer automatic punctuations instead.
- Machines generally transcribe all verbal sounds caught in recordings like filler words, side talks, and background conversations, making the finished transcript too cluttered and confusing for practical use.
- While some machine transcription programs can now transcribe and translate multi-language recordings, an article from The Washington Post suggested that translation machines are still prone to mistranslating words, terminologies, and sometimes even simple pronouns. This is because speech-to-text software programs have not generally gained common sense, and cannot fully digest language complexity.
TranscriptionWing adds a human touch
Anyone who has used automated transcription in the past can vouch that its accuracy is not always reliable. Automated transcripts will only be as good or bad as your audio files. In fact, even Sonix.ai (the best automated transcription service in terms of accuracy according to Pop Up Podcasting) admits that human editing is critical to accommodate transcript imperfections. Needless to say, human participation is still essential to generate the highest quality transcripts.
After you go digital with transcriptions, try using TranscriptionWing’s voice-to-text clean up service. Our elite team of human transcription editors can proofread your files, arrange speaker identification, and add timestamps to enable you to use machine-generated transcripts with utmost confidence.