Skip to main content

Podcasting and Audio Storytelling

Save yourself hours of frustration and hassle with this guide for podcasts and other forms of audio storytelling

Audio Transcription Resources

This portion of the LibGuide is intended as a resource to assist with audio transcriptions. It includes links, descriptions, and ratings of various third-party programs that use artificial intelligence (AI) to generate transcripts, and how you can use these transcripts either for the audio itself or in conjunction with the creation of video captions. Some of these services also offer human transcription, at a higher cost. 

There are a variety of uses for transcribing audio in this way:

  • Creating written transcripts of fieldwork interviews or other kind of research audio
  • Creating transcripts of podcasts or other audio content for hearing-impaired audiences
  • Creating subtitles for a video (especially using the options within Canvas and YouTube)
  • Creation transcripts of class lecture audio (for students with an accommodation via Services for Students with Disabilities)
     

General Things to Keep in Mind

  • Requires Clear Audio - Voices should be recorded close to the microphone so they are heard clearly.
  • Accuracy - Because this is a machine transcription, it will always miss some words including slang, punctuation, homophones (write/right), and proper nouns.
  • Proofreading - Using any of these platforms will require more editing than a human transcription. Balance out the amount of time this will take vs. transcribing it yourself (or paying someone to do so).
  • Accents - While many of these platforms support multiple languages, they are less reliable with accented variants within a given language.
  • Privacy - If the audio files are not meant to be shared (if you have an IRB and/or are working with sensitive interview data), you should not use these platforms. The majority of these platforms will send your audio to a server (somewhere) and the control you have to remove audio from those servers varies with each platform.

Our ratings are based on tests run with multiple audio interviews from the Michigan Time podcast, which was created by Maggie Cease, former Design Lab Resident.

Please use the table to navigate between the different sections (organized by the specific transcription platform), which will give you greater detail about each platform If you have any questions please feel free to email the Shapiro Design Lab at shapirodesignlab@umich.edu.

Audio Transcription Service Comparison Table
Audio Transcription Service Accuracy Ease of Use Cost Response Time Editing Time Notes
Canvas
Okay Easy Free Fast Medium Slow Free with Canvas account, intuitive interface, not as accurate as other services
Dictation.io
Bad Intermediate Free Slow Slow Free, listens in real time (speaker output can be made into input), so speed of audio should be slow. Must be connected to the internet
Rev.ai
Best Intermediate/Hard Low Cost Fast Medium Fast First five hours free, requires command line usage, encouraged for developers
Temi
Good Easy Moderate Cost Fast Fast Free one-time usage, API documentation available but also available in user-friendly interface
Trint
Best Easy Costly Fast Fast Easily editible, user-friendly interface, harder to delete files
YouTube
Okay Intermediate Free Medium Medium Slow Free, takes audio from .mp4 or other video format file (great for captioning videos)

Canvas (MiVideo)

Link to website

Note: Privacy policy regarding uploaded audio files could not be ascertained.

Steps

  1. Log onto Canvas.
  2. Click “Account”, then “Profile”.
  3. Click on “My Media”, then “Add new”, then “Media Upload” (works with audio or video).
  4. Upload file and click “Save”.
  5. Click “Go to my media” and select your file.
  6. Click “Actions” and then “Order captions”, then click the “Order Captions” button that appears. (This will only appear after the file has been uploaded and processed, which could vary given the speed of your internet connection and size of file).
  7. Once it loads, you will get a thank you message. You should see your caption request as “pending” or “in process.”
  8. Canvas will automatically develop captions for your file. Once it’s done, the status will say “complete” and you can click on the “Edit” button to review the transcription.
  9. After the transcript is complete, you can edit the transcription within their interface, export it as a file, or (if you've uploaded a video to transcribe) create captions/subtitles at specific points of the video. 

 

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions
  • Captioning for videos

Pros

  • Available to anyone with a Canvas account
  • Intuitive interface
  • Can be used to generate captions and subtitles for videos

Cons

  • Not as accurate as other analyzed services

To the Comparison Table


Dictation.io

Link to website

Note: You may need to slow down the play-speed of your audio, otherwise dictation.io may miss a few words. In this case, we use audacity, a freely available software to do so.

Steps

  1. Open your desired file in Audacity.
  2. Double click your track in Audacity to select the entire track, then use Change Tempo (Effects > Change Tempo) to slow down the track by dragging it to -20%.
  3. Download SAR / Voicemeter Banana or Soundflower. (We use one of these apps to manage your internal audio routing so that your internal audio signal from Audacity is used as the default sound output AND input for your computer).
  4. Open dictation.io in Google Chrome (note that safari or other browsers might not be able to run dictation.io).
  5. Hit play in Audacity, and immediately hit Start in dictation.io.
  6. You won’t hear the audio, but if you have it set up correctly, the dictation app should start transcribing it as it plays.
  7. Stop the recording on dictation.io when your audio stops and download your transcription in the desired format.
 

Best Use Cases

  • General Spoken Audio Transcription
  • Class Recording Transcriptions

Pros

  • Over 100 languages
  • Accent recognition
  • Speech recognition to write emails and documents

Cons

  • Requires additional setup with audacity and soundflower
  • Requires internet connection, and does not perform locally
  • Have to keep pausing it so that it can process the audio
  • No punctuations recognised
  • Affiliated with Google, so you would have to be mindful of the Google Privacy Policy when it comes to any data you share with these servers (not particularly  private).

To the Comparison Table


Rev.ai

Note: At the original time of publication, you could only use Rev.AI via the terminal/command line, and the instructions below reflect this workflow. However, recently they have added the ability to use this tool through the browser, meaning that you can upload audio directly to your account and then download either a JSON or TXT file of the final transcript. 

Link to website

Steps

  1. Sign up for account at www.rev.ai.
  2. Generate API key via SETTINGS, API Key.
  3. Make a folder on your computer and store the MP3 audio file in it.
  4. Open terminal/shell and type cd. Then drag the folder that has the audio to be transcribed into the terminal window. Make sure there is a space between cd and the path to the file, otherwise it won’t work.
  5. In the following command, enter your API Key from the RevAI website between "Bearer" and -H and replace "sample.mp3" with the filename of the audio you want to transcribe. Make sure to keep the @ symbol for the filename.
    1. curl -X POST "https://api.rev.ai/revspeech/v1beta/jobs" -H "Authorization: Bearer [API-KEY]"  -H "Content-Type: multipart/form-data" -F "media=@sample.mp3;type=audio/mp3" -F "options={\"metadata\":\"This is a sample submit jobs option for multipart\"}"
  6. Copy the above code and paste it into the terminal/shell. If it's formatted correctly, it should return a JSON Response like this:
    1. {"id":"00000000","created_on":"2018-12-03T22:27:10.807","name":"sample.mp3","metadata":"This is a sample submit jobs option for multipart","status":"in_progress"}
  7. Copy the value next to “id” and replace 00000000 in the following command. Also copy in your API key as you did in step 5.
    1. curl -X GET
      1. "https://api.rev.ai/revspeech/v1beta/jobs/uCEuXGDO2abv/transcript" -H "Authorization: Bearer [API-KEY]" -H "Accept: text/plain"
  8. After 5 to 10 minutes, paste that command into the command line/shell, and, if the transcription is finished, it will respond with a plain text version of the transcript. If it doesn't, it will respond with a message saying it's still in process.
  9. Copy the text from the command line/shell window to a new document and edit as needed.

Best Use Cases

  • General Spoken Audio Transcription
  • Transcription for a Podcast Episode
  • Class Recording Transcriptions
  • Video subtitle creation

Pros

  • Separates different speakers
  • Gives time at the beginning of a speaker’s sentence
  • Recognizes when there is audio that it could not understand (“<inaudible>”)
  • Includes grammar and punctuation
  • First five hours free (great if only using it a few times) - 3.5 cents/minute after that
  • Quick turnaround time
  • Code and command examples available on website
  • Good privacy settings (require access code and you can delete jobs)
  • High word fidelity
  • The way your data exists on the server is determined by you (you can choose to purge data with API calls or set up automated deletion policies, but they will not be implemented automatically)

Cons

  • After first five hours, costs $0.035/minute
  • Requires an API (installation of Python SDK or Nodes SDK)
  • Command line/coding knowledge encouraged
  • Mostly used for developers

To the Comparison Table


Temi

Link to Website

Steps

  1. Upload the audio file you want transcribed using the yellow “New Order” button on top right.
    1. Directly upload: .mp3, .mp4, .m4a, .aac, .wma, .avi, .wav, .mov.
    2. Paste URL links: YouTube, Vimeo, Dropbox, Facebook.
  2. Transcribe Audio File.

    1. Once file is uploaded, click the yellow “Checkout” button either at the top right of the page or the bottom right of the page.

    2. After checkout, Temi will begin to produce a transcription for you complete with timestamps and speaker identification.

    3. Transcription time will be dependent on how large your file is, but the transcription is typically ready in 5-10 minutes.

  3. Edit Transcription.

    1. Once Temi is done transcribing your file, it will appear under the “Dashboard” tab (can be accessed from the homepage → click on your username on the top right of the page, it will release a drop down menu where the first option is “Dashboard”) → Click on the “View Transcript” button for your file (this will be under “Status”).

    2. Make any necessary edits.

      1. Add speaker names by clicking on the “Add speaker” option on the left, which shows up again each time Temi identifies a change in speaker. You have the option to change all “Speaker 1” identifications to a specific name (similar to a “find and replace” feature).

      2. Replace any missed words by clicking on them and typing in the correct word.

  4. Export Transcription.

    1. Once you have reviewed the transcript and edited it to your satisfaction, click on the “Download” button on the top right of Temi’s interface and save your file.

    2. Temi can export a variety of file types (.docx, .pdf, .txt, .srt, .vtt) and gives you the option to include speaker names, timestamps, or export only highlighted sections.

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions

Pros

  • Intuitive interactive user interface
  • Accent and grammar recognition
  • Find and replace tool
    • Identified Speakers
    • Words
  • Many supported file types
  • Transcription takes 5-10 minutes to complete
  • First file free (regardless of length)
  • Files deleted from UI will be permanently deleted from server

Cons

  • Currently only supports English
  • Costly

To the Comparison Table


Trint

Link to Website

Steps

  1. Upload the audio file you want transcribed using the yellow “Upload” button on top right.
    1. Directly upload: .mp3, .mp4, .m4a, .aac, .wma, .avi, .wav, .mov.
    2. Import from: Box, Dropbox, Google Drive, Cloud Drive, OneDrive, Evernote, Gmail, record video, link/url.
  2. Transcribe Audio File.
    1. Once file is uploaded, Trint will automatically produce a transcription for you complete with timestamps and speaker identification.
    2. Transcription time will be dependent on how large your file is but typically takes less time than the length of your audio/video file (if your file is 45 minutes, the transcription takes less than 45 minutes).
  3. Edit Transcription.
    1. Once Trint is done transcribing your file, it will appear under the “All Trints” tab (should be the home page but if not, click menu on top left and it is the first option) → Click on your file!
    2. Make any necessary edits.
      1. Add speaker names by clicking on the “Add speaker” option on the left, which shows up again each time Trint identifies a change in speaker.
      2. Change timestamps (if a delayed start is desired).
      3. Replace any missed words by clicking on them and typing in the correct word.
      4. Add missed words to Trint’s vocab by selecting it and clicking the “Add to Vocab” button on the top menu.
  4. Export Transcription.
    1. Once you have reviewed the transcript and edited it to your satisfaction, click on the yellow “Export” button on the top left of Trint’s interface and save your file!
    2. Trint can export a variety of file types (.docx, .srt, .vtt, .edl, .html, .xml).

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions
  • Video subtitle creation

Pros

  • Intuitive interactive user interface
  • Accent and grammar recognition
  • English and 16 other languages
  • Vocab builder tool
  • Find and replace tool
  • Many supported file types

Cons

  • Difficulty with voice differentiation
  • Costly
  • Files deleted from the UI will only be permanently deleted from the server if your account is deleted or support is contacted

To the Comparison Table


YouTube

  • Note: If you have an audio file, you will have to convert your file into one of the following- .MOV, .MPEG4, .MP4, .AVI, .WMV, .MPEGPS, .FLV, 3GPP, WebM, DNxHR, ProRes, CineForm, HEVC (h265)

Step 1

Converting your mp3 file to mp4. via Adobe Premiere (or MPEGStreamClip If you do not have access to these software). You can find Adobe Premiere by signing into ‘apps anywhere’ link or through the Library computers across campus).

  1. Import your media into Adobe Premiere and create a new ‘Legacy title’ under ‘File’ Tab.
  2. Edit your title to your desired name by double clicking on the text box available on the main screen.
  3. Click and drag your media from ‘Project:Transcript’ in the left  to ‘Timeline’ at the bottom.
  4. Go to File tab and click on export media option.
  5. In the ‘Export Settings’ window, select ‘H.264’ in the ‘Format options’ and ‘Youtube 1080p HD’ in ‘Preset’.
  6. Select your desired file location and click on ‘Export’.
  7. Your file has been converted to an mp4 format.

Step 2

Uploading your file to YouTube.

  1. Open YouTube page, sign-in and click on ‘YouTube studio beta’ under ‘account’ tab in the top-right corner
  2. Click in ‘Upload video’ tab and enter the necessary info.
  3. In case of private contents, make sure your sharing options is turned to ‘Private’ and not ‘Public’ before you click on ‘Publish’.
  4. Once published, go to YouTube studio home-page, click on Editor and then click on ‘Transcriptions’ on the left.
  5. Click on your file and choose the language of transcription you want.
  6. Wait for a few minutes for it to translate and go back to your file and look for a ‘subtitles’ tab under ‘Video manager’ option
  7. Click on Actions>download and choose your desired format (.dtt, .srt, .sbv) of the transcription file

Step 3 (Optional)

Editing subtitles

  1. You can edit the subtitles as desired, on Youtube page without having to download it.
  2. To do so, instead of downloading the text, click on edit tab on the right.

Best Use Cases

  • Creates Subtitles with timelines within minutes
  • General Spoken Audio Transcription
  • Class Recording Transcriptions (especially if with video)

Pros

  • Supports download as text or docx.
  • Easily editable
  • Over 100 languages
  • Shows timelines

Cons

  • Requires conversion to an mp4 file before uploading through special softwares. (Ex: Adobe Premiere Pro; iMovie; Final Cut Pro; or most other video editing software)
  • No punctuation or grammar
  • Does not identify different speakers
  • Not great with accents
  • Affiliated with Google, so you would have to be mindful of the Google Privacy Policy when it comes to any data you share with these servers (not particularly private).

To the Comparison Table

If you would like to explore these tools more of need assistance, please feel free to email the Shapiro Design Lab at shapirodesignlab@umich.edu.