One more GCP certification on the list! This one was by far the most interesting one in a while as it gave me a chance to review topics that I don’t work with every day: Machine learning and Big data.
Let’s dive right in, here is the preparation I followed.
To begin with the online classes:
Read Google official documentation about the services in the scope of this exam:
- Cloud Storage, Google Transfer appliance
- Cloud SQL, Cloud Bigtable, Cloud Bigquery, Datastore
- Pub/Sub, Dataflow, Dataprep, Datastudio, Datalab
- Stackdriver, KMS, Machine Learning and its API
Then jump on the Google Next videos:
- Bigquery data modeling: https://www.youtube.com/watch?v=Vj6ksosHdhw
- Bigtable: https://www.youtube.com/watch?v=KaRbKdMInuc
- Datastore: https://www.youtube.com/watch?v=uZDk0NZGqHs
- Dataflow: https://www.youtube.com/watch?v=RxHijHZd0oM
- Processing data at scale: https://www.youtube.com/watch?v=dcSeF51HP3U
You can review case studies but don’t spend too much time on them. There is now a practice exam for the Data Engineer, quite similar to the type of questions you can find in the real exam:
My feedback on the exam:
- Check the scope of this exam, be prepared for design questions on database models, optimization and troubleshooting
- Know Bigquery VS Bigtable VS Datastore VS Cloud SQL
- Dataflow and how to deal with batch and stream processing
- Read as much as you can and play with machine learning!
- Supervised Learning VS Unsupervised Learning VS Semi-Supervised Learning VS Reinforcement Learning
- Review some ML algorithms for each (Clustering, Regression, Classification, Association Analysis, etc)
- How to share datasets, queries, reports is really something that comes often, don’t underestimate security aspects
- Understand Hadoop ecosystem, learn about the typical big data lifecycle on GCP
Good luck to everyone taking this exam!