Amazon Transcribe is a service provided by AWS that uses automatic speech recognition to convert the speech to text. One of the features the service provides is Vocabulary Filters, which lets us define a wordlist and filter out the words you don’t want in the transcription results.
I will be talking today about how to create these filters in Terraform. Unfortunately, there are no terraform resources readily available for Transcribe. So, we’re going to use null_resource and local-exec, which lets you run a local bash script and create the filters for you in Terraform.
Here’s the bash script for creating the vocabulary filters -
#!/bin/bash
# This will create the filter if it's not there or update it if already present
set -e
# Sample Values
# FILTER="no-hello-world"
# WORDLIST="hello, world"
# REGION="us-east-1"
# LANG="en-US"
create_filter() {
# shellcheck disable=SC2086
aws transcribe create-vocabulary-filter \
--vocabulary-filter-name "${FILTER}" \
--language-code "${LANG}" \
--words ${WORDLIST} \
--region "${REGION}"
}
update_filter() {
# shellcheck disable=SC2086
aws transcribe update-vocabulary-filter \
--vocabulary-filter-name "${FILTER}" \
--words ${WORDLIST} \
--region "${REGION}"
}
# list-vocabulary-filters has a --name-contains options but it will
# generate false positives due to substring matches, let's use jq.
# shellcheck disable=SC2086
DOES_FILTER_EXIST=$(aws transcribe list-vocabulary-filters | jq -r '.VocabularyFilters[] | select(.VocabularyFilterName=='\"${FILTER}\"') | any')
if [ "${DOES_FILTER_EXIST}" = true ]; then
update_filter
else
create_filter
fi
And here’s how we use the script with Terraform’s null_resource
resource "null_resource" "hello-filter" {
provisioner "local-exec" {
command = "./create-vocabulary-filter"
environment = {
FILTER = "hello"
WORDLIST = "hello hi hey ahoy"
LANG = "en-US"
REGION = var.region #add the region in your variables.tf
}
}
}
And we have our vocabulary filter live in console
I put all of this in GitHub repo tf-transcribe-filters