Skip to Main Content

Podcasting and Audio Storytelling

Basics for podcasts and other forms of audio storytelling production.

Accessibility for Audio Content

Regardless of medium, when it comes to accessibility, the goal is to make information available to the widest possible audience by presenting it in a variety of formats or through a platform that allows for easy interoperability with external devices that can transform it into other formats. 

Key accessibility questions include:

Can we find it? - Do I have sufficient, properly structured metadata to make my content discoverable by as many audiences as possible?

Can we use it? - Is my content viewable or actionable by different methods (e.g. not just mouse clicks, but keystrokes or menus that are compatible with speech commands?)

Can we read it? - Is my content available in multiple formats, or at least formats that are flexible and interoperable with other tools and software designed to enhance accessibility?

Can we participate or get help? - Does my content include ways for users to comment, interact with, or contact me?

Historically, federal accessibility regulations for audio have been far less rigorous than those for audiovisual media (e.g. closed captioning requirements for television). Web-based media platforms, however, provide much better opportunities for enhancing the accessibility of audio content than were available in the days when radio reigned supreme. Two of the most common options are transcripts and closed captioning.

Audio hosting platforms often lack robust features for transcripts and captioning, so making audio content more accessible often means embedding an audio player from your hosting site of choice on a different site and augmenting it with supporting text and metadata.

Audio Editing Software and Screen Readers

Audio production software can vary in terms of support for screen readers. Two recording and editing platforms that do have dedicated support for screen readers are the PC version of Audacity and both Mac and PC version of Reaper, which makes use of a third-party plugin for greater screen reader support. VoiceOver support for Mac software with GarageBand and Logic Pro varies with operating system and software version.

Transcripts and Closed Captioning

When it comes to planning for accessibility, earlier is always better. It is much more straightforward to design for accessibility in the first place than it is to go back and build it in later. With that in mind, make sure you have a plan for creating, hosting, and making available a transcript of any podcast episode or audio essay you create. Providing text along with audio ensures that what you have worked so hard to create can reach more types of audiences, and promotes equitable access to knowledge and information. 

A second, more dynamic option is closed captioning, which provides smaller text snippets in real time along with the audio track as it plays. 

Finally, there is a hybrid of the two called an interactive transcript which displays a full transcript alongside the audio content, but highlights the active snippet of text in real time as the audio plays. 

As you will note, all of these features place text at the forefront. This is because text can be more easily manipulated and transformed than audio, allowing for devices like screen readers to slow it down, speed it up, search it for key terms, or otherwise transform it with various types of accessibility software. 

Accessibility can also be improved by adding helpful metadata to your content that makes it more discoverable and easier to navigate through. Here is a guide to editing metadata for audio files.

WCAG Requirements

The Web Content Accessibility Guidelines (WCAG) were developed by the W3C, the leading independent standards body for the world wide web. They have since been codified in U.S. legal settlements and federal communications regulations.

Key requirements under WCAG include:

Transcripts AND closed captioning are required for all web-based audio and video content. This means there is a sea of non-compliant content out there, including lots of material created by educational institutions.

Playback must be 'keyboard functional.' This means any audio or video player that has mouse controls must have corresponding alternate keyboard control options for playback (stop, play, forward, rewind, pause).

'Equivalent Information.' This requirement is troublesome and hard to parse because of the inherent differences that exist across media types as well as types of sensory perception (e.g. sight vs. sound). That said, the crux of it lies in making sure that if there is auditory or visual information that would not be readily evident in a text transcript (e.g. the arrival of a new speaker or the inclusion of background music), then additional textual cues or metadata should be included to describe or represent these elements.

For more detailed information on best pratices for accessibility, check out U-M Library's Digital Accessibility guide.

Audio Transcription Resources

This portion of the LibGuide is intended as a resource to assist with audio transcriptions. It includes links, descriptions, and ratings of various third-party programs that use artificial intelligence (AI) to generate transcripts, and how you can use these transcripts either for the audio itself or in conjunction with the creation of video captions. Some of these services also offer human transcription, at a higher cost. 

There are a variety of uses for transcribing audio in this way:

  • Creating written transcripts of fieldwork interviews or other kind of research audio
  • Creating transcripts of podcasts or other audio content for hearing-impaired audiences
  • Creating subtitles for a video (especially using the options within Canvas and YouTube)
  • Creation transcripts of class lecture audio (for students with an accommodation via Services for Students with Disabilities)
     

General Things to Keep in Mind

  • Requires Clear Audio - Voices should be recorded close to the microphone so they are heard clearly.
  • Accuracy - Because this is a machine transcription, it will always miss some words including slang, punctuation, homophones (write/right), and proper nouns.
  • Proofreading - Using any of these platforms will require more editing than a human transcription. Balance out the amount of time this will take vs. transcribing it yourself (or paying someone to do so).
  • Accents - While many of these platforms support multiple languages, they are less reliable with accented variants within a given language.
  • Privacy - If the audio files are not meant to be shared (if you have an IRB and/or are working with sensitive interview data), you should not use these platforms. The majority of these platforms will send your audio to a server (somewhere) and the control you have to remove audio from those servers varies with each platform.

Our ratings are based on tests run with multiple audio interviews from the Michigan Time podcast, which was created by Maggie Cease, former Design Lab Resident.

Please note that there are other platforms available that were built after we ran these tests. We hope to do a similar test with these platforms in the future and will update the table accordingly. Other platforms to check out include Otter, Descript, and Sonix

Please use the table to navigate between the different sections (organized by the specific transcription platform), which will give you greater detail about each platform If you have any questions please feel free to email the Shapiro Design Lab at shapirodesignlab@umich.edu.

Audio Transcription Service Comparison Table
Audio Transcription Service Accuracy Ease of Use Cost Response Time Editing Time Notes
Canvas
Okay Easy Free Fast Medium Slow Free with Canvas account, intuitive interface, not as accurate as other services
Dictation.io
Bad Intermediate Free Slow Slow Free, listens in real time (speaker output can be made into input), so speed of audio should be slow. Must be connected to the internet
Rev.ai
Best Easy (browser) Intermediate/Hard (command line) Low Cost Fast Fast First five hours free, can be done in browser or on command line 
Temi
Good Easy Moderate Cost Fast Fast Free one-time usage, API documentation available but also available in user-friendly interface
Trint
Best Easy Costly Fast Fast Easily editible, user-friendly interface, harder to delete files
YouTube
Okay Intermediate Free Medium Medium Slow Free, takes audio from .mp4 or other video format file (great for captioning videos)

Canvas (MiVideo)

Link to website

Note: Privacy policy regarding uploaded audio files could not be ascertained.

Steps

  1. Log onto Canvas.
  2. Click “Account”, then “Profile”.
  3. Click on “My Media”, then “Add new”, then “Media Upload” (works with audio or video).
  4. Upload file and click “Save”.
  5. Click “Go to my media” and select your file.
  6. Click “Actions” and then “Order captions”, then click the “Order Captions” button that appears. (This will only appear after the file has been uploaded and processed, which could vary given the speed of your internet connection and size of file).
  7. Once it loads, you will get a thank you message. You should see your caption request as “pending” or “in process.”
  8. Canvas will automatically develop captions for your file. Once it’s done, the status will say “complete” and you can click on the “Edit” button to review the transcription.
  9. After the transcript is complete, you can edit the transcription within their interface, export it as a file, or (if you've uploaded a video to transcribe) create captions/subtitles at specific points of the video. 

 

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions
  • Captioning for videos

Pros

  • Available to anyone with a Canvas account
  • Intuitive interface
  • Can be used to generate captions and subtitles for videos

Cons

  • Not as accurate as other analyzed services

To the Comparison Table


Dictation.io

Link to website

Note: You may need to slow down the play-speed of your audio, otherwise dictation.io may miss a few words. In this case, we use audacity, a freely available software to do so.

Steps

  1. Open your desired file in Audacity.
  2. Double click your track in Audacity to select the entire track, then use Change Tempo (Effects > Change Tempo) to slow down the track by dragging it to -20%.
  3. Download SAR / Voicemeter Banana or Soundflower. (We use one of these apps to manage your internal audio routing so that your internal audio signal from Audacity is used as the default sound output AND input for your computer).
  4. Open dictation.io in Google Chrome (note that safari or other browsers might not be able to run dictation.io).
  5. Hit play in Audacity, and immediately hit Start in dictation.io.
  6. You won’t hear the audio, but if you have it set up correctly, the dictation app should start transcribing it as it plays.
  7. Stop the recording on dictation.io when your audio stops and download your transcription in the desired format.
 

Best Use Cases

  • General Spoken Audio Transcription
  • Class Recording Transcriptions

Pros

  • Over 100 languages
  • Accent recognition
  • Speech recognition to write emails and documents

Cons

  • Requires additional setup with audacity and soundflower
  • Requires internet connection, and does not perform locally
  • Have to keep pausing it so that it can process the audio
  • No punctuations recognised
  • Affiliated with Google, so you would have to be mindful of the Google Privacy Policy when it comes to any data you share with these servers (not particularly  private).

To the Comparison Table


Rev.ai

Note: At the original time of publication, you could only use Rev.AI via the terminal/command line, and the instructions below reflect this workflow. However, recently they have added the ability to use this tool through the browser, meaning that you can upload audio directly to your account and then download either a JSON or TXT file of the final transcript. Use the link below to visit the site and follow their instructions for transcribing a piece of audio. The browser version also gives you the option of entering a custom vocabulary, which can increase accuracy and cut down editing time. Also, the platform can now provide transcriptions in five different languages: English, French, German,  Portuguese, and Spanish.

Link to website

Steps (Command Line)

  1. Sign up for account at www.rev.ai.
  2. Generate API key via SETTINGS, API Key.
  3. Make a folder on your computer and store the MP3 audio file in it.
  4. Open terminal/shell and type cd. Then drag the folder that has the audio to be transcribed into the terminal window. Make sure there is a space between cd and the path to the file, otherwise it won’t work.
  5. In the following command, enter your API Key from the RevAI website between "Bearer" and -H and replace "sample.mp3" with the filename of the audio you want to transcribe. Make sure to keep the @ symbol for the filename.
    1. curl -X POST "https://api.rev.ai/revspeech/v1beta/jobs" -H "Authorization: Bearer [API-KEY]"  -H "Content-Type: multipart/form-data" -F "media=@sample.mp3;type=audio/mp3" -F "options={\"metadata\":\"This is a sample submit jobs option for multipart\"}"
  6. Copy the above code and paste it into the terminal/shell. If it's formatted correctly, it should return a JSON Response like this:
    1. {"id":"00000000","created_on":"2018-12-03T22:27:10.807","name":"sample.mp3","metadata":"This is a sample submit jobs option for multipart","status":"in_progress"}
  7. Copy the value next to “id” and replace 00000000 in the following command. Also copy in your API key as you did in step 5.
    1. curl -X GET
      1. "https://api.rev.ai/revspeech/v1beta/jobs/uCEuXGDO2abv/transcript" -H "Authorization: Bearer [API-KEY]" -H "Accept: text/plain"
  8. After 5 to 10 minutes, paste that command into the command line/shell, and, if the transcription is finished, it will respond with a plain text version of the transcript. If it doesn't, it will respond with a message saying it's still in process.
  9. Copy the text from the command line/shell window to a new document and edit as needed.

Best Use Cases

  • General Spoken Audio Transcription
  • Transcription for a Podcast Episode
  • Class Recording Transcriptions
  • Video subtitle creation

Pros

  • Separates different speakers
  • Gives time at the beginning of a speaker’s sentence
  • Recognizes when there is audio that it could not understand (“<inaudible>”)
  • Includes grammar and punctuation
  • First five hours free (great if only using it a few times) - 3.5 cents/minute after that
  • Quick turnaround time
  • Code and command examples available on website
  • Good privacy settings (require access code and you can delete jobs)
  • High word fidelity
  • The way your data exists on the server is determined by you (you can choose to purge data with API calls or set up automated deletion policies, but they will not be implemented automatically)

Cons

  • After first five hours, costs $0.035/minute
  • Requires an API (installation of Python SDK or Nodes SDK)
  • Command line/coding knowledge encouraged
  • Mostly used for developers

To the Comparison Table


Temi

Link to Website

Steps

  1. Upload the audio file you want transcribed using the yellow “New Order” button on top right.
    1. Directly upload: .mp3, .mp4, .m4a, .aac, .wma, .avi, .wav, .mov.
    2. Paste URL links: YouTube, Vimeo, Dropbox, Facebook.
  2. Transcribe Audio File.

    1. Once file is uploaded, click the yellow “Checkout” button either at the top right of the page or the bottom right of the page.

    2. After checkout, Temi will begin to produce a transcription for you complete with timestamps and speaker identification.

    3. Transcription time will be dependent on how large your file is, but the transcription is typically ready in 5-10 minutes.

  3. Edit Transcription.

    1. Once Temi is done transcribing your file, it will appear under the “Dashboard” tab (can be accessed from the homepage → click on your username on the top right of the page, it will release a drop down menu where the first option is “Dashboard”) → Click on the “View Transcript” button for your file (this will be under “Status”).

    2. Make any necessary edits.

      1. Add speaker names by clicking on the “Add speaker” option on the left, which shows up again each time Temi identifies a change in speaker. You have the option to change all “Speaker 1” identifications to a specific name (similar to a “find and replace” feature).

      2. Replace any missed words by clicking on them and typing in the correct word.

  4. Export Transcription.

    1. Once you have reviewed the transcript and edited it to your satisfaction, click on the “Download” button on the top right of Temi’s interface and save your file.

    2. Temi can export a variety of file types (.docx, .pdf, .txt, .srt, .vtt) and gives you the option to include speaker names, timestamps, or export only highlighted sections.

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions

Pros

  • Intuitive interactive user interface
  • Accent and grammar recognition
  • Find and replace tool
    • Identified Speakers
    • Words
  • Many supported file types
  • Transcription takes 5-10 minutes to complete
  • First file free (regardless of length)
  • Files deleted from UI will be permanently deleted from server

Cons

  • Currently only supports English
  • Costly

To the Comparison Table


Trint

Link to Website

Steps

  1. Upload the audio file you want transcribed using the yellow “Upload” button on top right.
    1. Directly upload: .mp3, .mp4, .m4a, .aac, .wma, .avi, .wav, .mov.
    2. Import from: Box, Dropbox, Google Drive, Cloud Drive, OneDrive, Evernote, Gmail, record video, link/url.
  2. Transcribe Audio File.
    1. Once file is uploaded, Trint will automatically produce a transcription for you complete with timestamps and speaker identification.
    2. Transcription time will be dependent on how large your file is but typically takes less time than the length of your audio/video file (if your file is 45 minutes, the transcription takes less than 45 minutes).
  3. Edit Transcription.
    1. Once Trint is done transcribing your file, it will appear under the “All Trints” tab (should be the home page but if not, click menu on top left and it is the first option) → Click on your file!
    2. Make any necessary edits.
      1. Add speaker names by clicking on the “Add speaker” option on the left, which shows up again each time Trint identifies a change in speaker.
      2. Change timestamps (if a delayed start is desired).
      3. Replace any missed words by clicking on them and typing in the correct word.
      4. Add missed words to Trint’s vocab by selecting it and clicking the “Add to Vocab” button on the top menu.
  4. Export Transcription.
    1. Once you have reviewed the transcript and edited it to your satisfaction, click on the yellow “Export” button on the top left of Trint’s interface and save your file!
    2. Trint can export a variety of file types (.docx, .srt, .vtt, .edl, .html, .xml).

Best Use Cases

  • General spoken audio transcriptions
  • Podcast or interview transcriptions
  • Class recording transcriptions
  • Video subtitle creation

Pros

  • Intuitive interactive user interface
  • Accent and grammar recognition
  • English and 16 other languages
  • Vocab builder tool
  • Find and replace tool
  • Many supported file types

Cons

  • Difficulty with voice differentiation
  • Costly
  • Files deleted from the UI will only be permanently deleted from the server if your account is deleted or support is contacted

To the Comparison Table


YouTube

  • Note: If you have an audio file, you will have to convert your file into one of the following- .MOV, .MPEG4, .MP4, .AVI, .WMV, .MPEGPS, .FLV, 3GPP, WebM, DNxHR, ProRes, CineForm, HEVC (h265)

Step 1

Converting your mp3 file to mp4. via Adobe Premiere (or MPEGStreamClip If you do not have access to these software). You can find Adobe Premiere by signing into ‘apps anywhere’ link or through the Library computers across campus).

  1. Import your media into Adobe Premiere and create a new ‘Legacy title’ under ‘File’ Tab.
  2. Edit your title to your desired name by double clicking on the text box available on the main screen.
  3. Click and drag your media from ‘Project:Transcript’ in the left  to ‘Timeline’ at the bottom.
  4. Go to File tab and click on export media option.
  5. In the ‘Export Settings’ window, select ‘H.264’ in the ‘Format options’ and ‘Youtube 1080p HD’ in ‘Preset’.
  6. Select your desired file location and click on ‘Export’.
  7. Your file has been converted to an mp4 format.

Step 2

Uploading your file to YouTube.

  1. Open YouTube page, sign-in and click on ‘YouTube studio beta’ under ‘account’ tab in the top-right corner
  2. Click in ‘Upload video’ tab and enter the necessary info.
  3. In case of private contents, make sure your sharing options is turned to ‘Private’ and not ‘Public’ before you click on ‘Publish’.
  4. Once published, go to YouTube studio home-page, click on Editor and then click on ‘Transcriptions’ on the left.
  5. Click on your file and choose the language of transcription you want.
  6. Wait for a few minutes for it to translate and go back to your file and look for a ‘subtitles’ tab under ‘Video manager’ option
  7. Click on Actions>download and choose your desired format (.dtt, .srt, .sbv) of the transcription file

Step 3 (Optional)

Editing subtitles

  1. You can edit the subtitles as desired, on Youtube page without having to download it.
  2. To do so, instead of downloading the text, click on edit tab on the right.

Best Use Cases

  • Creates Subtitles with timelines within minutes
  • General Spoken Audio Transcription
  • Class Recording Transcriptions (especially if with video)

Pros

  • Supports download as text or docx.
  • Easily editable
  • Over 100 languages
  • Shows timelines

Cons

  • Requires conversion to an mp4 file before uploading through special softwares. (Ex: Adobe Premiere Pro; iMovie; Final Cut Pro; or most other video editing software)
  • No punctuation or grammar
  • Does not identify different speakers
  • Not great with accents
  • Affiliated with Google, so you would have to be mindful of the Google Privacy Policy when it comes to any data you share with these servers (not particularly private).

To the Comparison Table

If you would like to explore these tools more of need assistance, please feel free to email the Shapiro Design Lab at shapirodesignlab@umich.edu.