Google Cloud Data Engineer Professional certified!

One more GCP certification on the list! This one was by far the most interesting one in a while as it gave me a chance to review topics that I don’t work with every day: Machine learning and Big data.

 

Let’s dive right in, here is the preparation I followed:

My feedback on the exam:

  • Check the scope of this exam, be prepared for design questions on database models, optimization and troubleshooting
  • Know Bigquery VS Bigtable VS Datastore VS Cloud SQL
  • Dataflow and how to deal with batch and stream processing
  • Read as much as you can and play with machine learning!
  • How to share datasets, queries, reports is really something that comes often, don’t underestimate security aspects
  • Understand Hadoop ecosystem, learn about the typical big data lifecycle on GCP

Good luck to everyone taking this exam!

Google Cloud Platform – Machine learning APIs

I have been watching a few Google Cloud Platform videos recently from Google Cloud Next and really enjoyed the demo in one of them: Machine learning APIs (Demo @11″35).

The idea is simply to record your voice (here using the microphone on your laptop). Then the audio file is sent to Cloud Storage.

By using Google Speech, you can not only get a transcript of your record, but you can add additional context words in your API call to make sure GCP understands it perfectly.
Example:

"speechContext": { "phrases": ["GKE", "Kubernetes", "Containers"] }

I tried to work on the script to do the exact same thing and decided to share it if you want to try it at home.
Prerequisites are:

  • A GCP projet
  • Run the following command on your laptop:
    brew install sox --with-flac
  • Download and install Google Cloud SDK
  • Create a Cloud Storage bucket
  • Create an API Key and give it access to Google Speech

#!/usr/bin/env bash

# Configuration
GCP_USERNAME=<my-user-email>
GCP_PROJECT_ID=<my-project-id>
BUCKET_NAME=<my-bucket-name>
API_KEY=<my-api-key>

gcloud auth login $GCP_USERNAME
gcloud config set project $GCP_PROJECT_ID

# Recording with Sox (brew install sox --with-flac)
rec --encoding signed-integer --bits 32 --channels 1 --rate 44100 recording.flac

# Upload to Cloud Storage
gsutil cp -a public-read recording.flac gs://$BUCKET_NAME

# Prepare our request parameters for Google Speech
cat <<< '
{
    "config": {
    "encoding":"FLAC",
    "sample_rate": 44100,
    "language_code": "en-US",
    "speechContext": {
        "phrases": ["<My context word>"]
    }
    },
    "audio": {
        "uri":"gs://'$BUCKET_NAME'/recording.flac"
    }
}' > request.json

# API call to Google Speech
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \
"https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=$API_KEY"

# Cleaning
rm -f recording.flac

If you are interested in learning more about AI, there is a great video from Andrew Ng which covers the state of AI today and what you can do to be the next AI company!

Google Cloud Platform – Start stop instance scheduler

I recently worked on a feature missing on GCP: a start stop scheduler for my GCP instances based on labels. I was first excited about using Cloud functions, but it seemed App Engine was the way to go for several reasons: it supports python and the task scheduling feature is already embedded.

I had a few requirements:

  • Ability to schedule start and stop of GCE instances every hour
  • Extra options to run only during working days or weekends, default is every day
  • It must work across all projects inside an organisation if you give the right permissions to your default App Engine service account
  • Inexpensive to run (or free), who wants to pay for a feature that should be available by default in the cloud?
    • According to https://cloud.google.com/free/docs/always-free-usage-limits you should have 28 instance hours of App Engine Standard free per day.
    • If you are already using App Engine for something else, the script is easy to merge with your application code.
    • If you don’t want to use App Engine, the python code can be executed from any other machine with the right credentials, even your laptop if not critical.

To deploy the solution, please follow the instructions from the following repository: https://github.com/pchapotet/gcp-start-stop-scheduler

Once it is installed, simply add a few tags to your instances and enjoy the automation! You can run it only during working days (Monday to Friday) with ‘d’ option and during weekend (Saturday and Sunday) with ‘w’ option. Feel free to comment and raise Github issues if you see anything to improve.

With just 2 labels, it starts your instance at 8am and stops it at midnight every day during working days.
Google Cloud Architect Professional certified!

Taking the GCP Architect exam is quite a challenge as there is very little study material or practice questions available at the moment.

To prepare for the exam:

To sum up the exam without saying too much, it was 50 questions for a total of 120 minutes. Timing is friendly, I had about 15-20 minutes left before the end. Half of the exam worked pretty easily by proceeding by elimination to remove the craziest answers. I was surprise to see a split screen with questions on the left and a listbox on the right allowing to switch between the 4 use cases available at the moment.

About 15 questions were related to use cases. They seemed more complex, even confusing sometimes. I had to use only 2 use cases out of 4, the rest of the questions is more general and seemed to be what I would categorize as medium level questions.

A few points I would suggest to work on:

  • Prepare yourself with the 4 use cases available, work on them for an hour as if they were your customer and how you would deal with each point (means which service you would use on GCP instead of what they have)
  • Read about BQ, Bigtable, CloudStorage, Pub/Sub, Dataflow, Dataproc and when to use all of them
  • Container engine vs Compute Engine vs App Engine
  • Know cloud related business terms: capex, opex, tco, capacity planning
  • Best practices regarding IAM, audit logs and how to secure them
  • Know resources that are global vs regional vs zonal (some major differences with AWS)
  • Know how are structured the different databases
  • Learn everything about instance groups, load balancers, stress tests
  • CI/CD on GCP, how to architect perfectly dev/qa/stg/prod environments
  • You will have to look at Java and Python code as expected
  • Cloud deployment manager is part of the exam and interesting to know in details
  • Migration: how do you deal with existing DC, move data around, etc
  • Network: VPN, firewall, tags

Once again, good luck to everyone taking this exam!