Amazon Transcribe is a service provided by AWS that uses automatic speech recognition to convert the speech to text. One of the features the service provides is Vocabulary Filters, which lets us define a wordlist and filter out the words you don’t want in the transcription results.

I will be talking today about how to create these filters in Terraform. Unfortunately, there are no terraform resources readily available for Transcribe. So, we’re going to use null_resource and local-exec, which lets you run a local bash script and create the filters for you in Terraform.

Here’s the bash script for creating the vocabulary filters -

#!/bin/bash
# This will create the filter if it's not there or update it if already present
set -e

# Sample Values
# FILTER="no-hello-world"
# WORDLIST="hello, world"
# REGION="us-east-1"
# LANG="en-US"

create_filter() {
# shellcheck disable=SC2086
  aws transcribe create-vocabulary-filter \
    --vocabulary-filter-name "${FILTER}" \
    --language-code "${LANG}" \
    --words ${WORDLIST} \
    --region "${REGION}"
}

update_filter() {
# shellcheck disable=SC2086
  aws transcribe update-vocabulary-filter \
    --vocabulary-filter-name "${FILTER}" \
    --words ${WORDLIST} \
    --region "${REGION}"
}

# list-vocabulary-filters has a --name-contains options but it will
# generate false positives due to substring matches, let's use jq.
# shellcheck disable=SC2086
DOES_FILTER_EXIST=$(aws transcribe list-vocabulary-filters | jq -r '.VocabularyFilters[] | select(.VocabularyFilterName=='\"${FILTER}\"') | any')
if [ "${DOES_FILTER_EXIST}" = true ]; then
  update_filter
else
  create_filter
fi

And here’s how we use the script with Terraform’s null_resource

resource "null_resource" "hello-filter" {
  provisioner "local-exec" {
    command = "./create-vocabulary-filter"
    environment = {
      FILTER   = "hello"
      WORDLIST = "hello hi hey ahoy"
      LANG     = "en-US"
      REGION   = var.region #add the region in your variables.tf
    }
  }
}

And we have our vocabulary filter live in console Vocabulary Filter

I put all of this in GitHub repo tf-transcribe-filters