Venn Technology


Sending audio files to Google Cloud for transcription via Workato

Written by Chase Friedman
On September 11, 2017
Pretty much everyone knows what Speech-to-Text is these days. You likely use it all the time with your phone to transcribe your voice to text messages or to whip out a quick email. But what if you need to do the same thing in mass? What if you have a large number of audio recordings and you need to get the transcripts fast and cheap. You probably don’t have time to farm this out to a dedicated transcription service and you probably don’t need 100% accuracy and the price tag that comes with it.

We’ve built out a connection to Google Cloud Platform via Workato which gives us the ability to handle these types of mass requests. Our tool here handles audio files one-at-a-time for demo purposes but we can easily scale this to handle anywhere from 10,000 to 50,000 audio files in the same time frame.

A peek into the backend

Here’s a quick step by step bullet process before we get to the meat of the post:

  • We start the process by utilizing a trigger from a Wufoo form, the one you fill out on our demo page.
  • We then check to see if you exist in our database with a callable recipe to offer a personalized response.
  • After that, we fire up our error handler and begin the main process
    • First, we take the audio file and download the contents
    • Put the contents into Google Cloud Storage with an OAuth2 connection
    • Tell Google we want a transcript of this audio file (more below)
    • Then we upload the file to Dropbox and send you the link via email.

It’s a relatively straight forward recipe but the complexity arrives when you have to send and receive the file for transcription.

Getting the transcription for an audio file via Workato and Google Cloud Platform’s Speech to Text API

In this action we’re going to send the audio file over to the Speech to Text API and configure the request to our needs:

  • Where you see “File type” we’re telling Google what type of file it is: i.e. FLAC, MP3, MPEG4, etc.
  • Where you see “Keywords” we’re giving Google a list of keywords that are hard to pronounce or sound like other similar words. Names of people work too.
  • In the audio section, we’re telling Google where the audio file exists on Google Cloud Storage.

The Transcription Results

Getting the results is much easier. We’re simply going to tell Workato what we expect to see back in our Response body example. Here we’re grabbing the transcript of the audio file and the confidence factor that Google had when it was transcribed. A really low confidence level is indicative of some problems with the audio in the file.

Using the Results

After we get the results we can utilize them in just about any connection we’d like. Here are a couple of things you could do:

  • Send it to Salesforce to “Log a Call” and copy the transcription over in the description.
  • Send it to Amazon Mechanical Turk for review and corrections
  • Send a rolling stream of files from your call center and then send it on to Google’s Sentiment Analysis API.
    • You could then even store the results in Salesforce and have notifications go out when the sentiment is really bad for a particular call.
  • Save it to another Cloud Storage provider for audibility.
  • The possibilities are endless…

If you have a use case that you think would fit this, reach out and we’ll see if we can help!

Like this post? Why not give it a share:

Interested in how we can apply this kind of thinking to your business?