I have been watching a few Google Cloud Platform videos recently from Google Cloud Next and really enjoyed the demo in one of them: Machine learning APIs.
The idea is simply to record your voice (here using the microphone on your laptop). Then the audio file is sent to Cloud Storage. Demo @11″35.
By using Google Speech, you can not only get a transcript of your record, but you can add additional context words in your API call to make sure GCP understands it perfectly.
Example:
"speechContext": { "phrases": ["GKE", "Kubernetes", "Containers"] }
I tried to work on the script to do the exact same thing and decided to share it if you want to try it at home.
Prerequisites are:
- A GCP projet
- Run the following command on your laptop:
brew install sox --with-flac
- Download and install Google Cloud SDK
- Create a Cloud Storage bucket
- Create an API Key and give it access to Google Speech
#!/usr/bin/env bash
# Configuration
GCP_USERNAME=<my-user-email>
GCP_PROJECT_ID=<my-project-id>
BUCKET_NAME=<my-bucket-name>
API_KEY=<my-api-key>
gcloud auth login $GCP_USERNAME
gcloud config set project $GCP_PROJECT_ID
# Recording with Sox (brew install sox --with-flac)
rec --encoding signed-integer --bits 32 --channels 1 --rate 44100 recording.flac
# Upload to Cloud Storage
gsutil cp -a public-read recording.flac gs://$BUCKET_NAME
# Prepare our request parameters for Google Speech
cat <<< '
{
"config": {
"encoding":"FLAC",
"sample_rate": 44100,
"language_code": "en-US",
"speechContext": {
"phrases": ["<My context word>"]
}
},
"audio": {
"uri":"gs://'$BUCKET_NAME'/recording.flac"
}
}' > request.json
# API call to Google Speech
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \
"https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=$API_KEY"
# Cleaning
rm -f recording.flac
If you are interested in learning more about AI and machine learning, there is a great video from Andrew Ng which covers the state of AI today and what you can do to be the next AI company!