Recording Fail

I was very excited about my recording set up from my first research trip up to the Chicago area. Turns out, it was an epic fail. Having a “room recording” microphone like the one I described in my last post is fine for just capturing reference audio, but the quality of the recording is NOT good enough to be used for automated transcription.

As I was messing around with the tools and software for this stage of the research, I did some quick math. It took me about 2 hours to transcribe about 30 minutes of audio. If I have 50 interviews by the time I’m done, and each interview is 90 minutes, that’s 75 hours of interviews. As each hour of interview takes about four hours to transcribe (even using a USB transcription foot pedal and typing around 85 wpm). So that’s about 300 hours of work just transcribing the interviews. Let’s say I can somehow do this for 3 hours each day before my fingers rebel and my brain fries (and knowing I have other responsibilities, too); that puts me at 100 days of work. I am still a pastor and weekends are dedicated to my family (Saturdays) and the Church (Sundays). So I’m working on this 5 days per week. That puts me JUST transcribing… not doing ANYTHING else on my dissertation… for… 25 weeks. That’s 6 months of transcribing, and that’s if I never miss a day for a funeral or a crisis counseling appointment or to invest in a sermon that’s not quite coming together, or responding to a world-wide pandemic as the coronavirus spreads through community after community.

So should I pay for transcription? Not if the estimates I see online are any indication. The estimates say that professional transcribers can work at double my speed (1 hour of audio = 2 hours to transcribe). The median rate of pay for transcription services is about $16-25 hour. So for 75 hours of audio, or 150 hours of transcription work, I’d be looking at $2,400 to $3,750. And that’s if all the interviews come in at 90 minutes. If I go over, it’s even more. I do not have thousands of dollars to pay a transcriptionist.

Cue the majestic, cinematic swell: Enter Amazon Web Services. I can upload an mp3 file to S3, and use their AI-based transcription service. The cost? For 90 minutes of transcribed audio (drumroll, please): $2.16. That’s it. This means the total cost for 50 interviews, at 90 minutes each (thus 75 hours of interview) is $122.40. Now THAT I can handle. I’ll easily pay a hundred bucks to save me six months of work or a couple of thousand dollars. But does it work?

Here’s the hitch. AWS transcription works phenomenally well… if you have a HIGH QUALITY audio source file. The audio from my Blue multi-capsule USB mic was decidedly low quality. The mic sat too far away from my subject; when the subject’s voice dropped, AWS couldn’t transcribe it. My recording yielded an accuracy of less than 40 %. It takes more time to fix a 40% accuracy transcription than it does to just transcribe it the old fashioned way with my foot pedal.

So I tried an experiment. I “re-recorded” the interview. I played the audio on my iPhone through headphones, so I could just repeat everything I heard back into my MacBook Pro through the Blue mic… but this time, the mic was right in front of my face and I took care to speak with clear diction. I did a 5 minute test file, and uploaded it to my AWS S3 storage. From there, I submitted it to the AWS transcription algorithms. And what came back was something close to 98% accuracy — and where it wasn’t accurate, I could easily tell what the correct words should have been.

So I re-recorded an entire 2 hour interview with the Lead Pastor at this first church I visited in Chicago. Yes, this was a time-heavy investment. But I sped up the audio on my iPhone, so I got the whole interview re-recorded in just over 2 hours. Then I sent it to Amazon, and then I tidied up the transcript that AWS provided. Yes, it still took about 30 minutes to clean up the transcript and to format it. But that’s nowhere near the kind of brain-numbing finger-killing work that transcribing it myself would be.

So the key to everything?

High quality recordings. From the outset. Whatever it takes to get amazingly clear audio recordings of the interviews. It’s worth significant investment in equipment if it will save me thousands of dollars of transcription fees and hundreds of days of work. Stay tuned. I’m off to research some new toys…