by Dan Delong
Although the spoken word, recorded as an audio file, requires little storage space, finding content within a long recording would be a lot easier if it were transcribed into text.
About a year ago, I used my phone to record a long conversation with an older relative, concerning her father's family history (brother of my grandfather). At the time, I gave no thought as to how this audio file would be used, other than to make it available to her children, and possibly make some typed up notes for myself.
Since text-to-speech software has gotten so good, I wondered if the converse would be true; that is, can speech-to-text now equal the accuracy of text-to-speech? After all, Siri, Google Home, and Cortana seem to handle speech-to-speech, almost flawlessly. But, they all use sophisticated AI algorithms hosted on powerful Cloud servers, with no offer of free speech-to-text, as far as I knew.
Number One son suggested opening a Google Doc, and turning on Voice Typing. Then play the recording into the computer microphone in an otherwise quiet room.
He also suggested using YouTube's automatic captioning [ https://www.youtube.com/watch?v=Y7FDktLN_f8]. The web interface generates text captions (like those on television), but getting that captioning text into a human-readable file format may be a problem.
Picture 1 YouTube captioning from a video or audio file
Dragon Naturally Speaking has been around for many years, but requires purchase and a lot of voice training, as I recall. [Others methods (not free and not tried): - IBM Speech to Text, Temi, Braina Pro, Transcribe]
I still tried to find some free, installable, software solutions. Considering the poor quality of this audio recording, I felt transcribing would be a challenge, because both of us were talking, sometimes at the same time and at different distances from the microphone.
[I have since learned, that the software may have had a better chance 'understanding' this audio file, had I first 'normalized' the sound level. MP3 Gain is a free program for performing this task. ( http://mp3gain.sourceforge.net/)]
Google's Speech Notes (free, online) may have been a good choice. Cortana may be tweakable for this purpose, as well.
One online service, wisely, provided 30 minutes of AI enhanced transcription, free. It asks for the names of the two speakers, then consumed the audio file for transcription.
All options considered above have not yet been tried; this particular audio file is probably best left alone, although I used the audio file again to type up some notes.
Since this interview, my mother's cousin has died. Her children will enjoy hearing her voice, as she recounts stories of her childhood home and memories of her parents.
So, I'm still looking for a convenient transcription method, that is both free and accurate.
If I ever record another interview, it will be with two microphones (two channels), or an exchange of one microphone, so that only one person is recorded at a time on the same channel. Such audio files would need to be 'normalized' before automatic transcription.
Dan's Note "This is not downloadable, free, software. I've been trying to auto transcribe an old audio file - an interview with an elder relative. This just describes the transcription attempt." Dan can be reached here.
Platform: A Computer
Download Size: N/A
Installed Size: N/A
Download Site N/A