Google Cloud Professional ML Engineer Certification - My Study Track

I recently passed the exam for the newest Google Cloud certification on engineering machine learning solutions. Of the 12 GCP certification exams that I’ve now sat for (9 certifications and 3 renewals – they expire after 2 years!), this was the most challenging one yet.

Being able to understand and articulate how ML works at a basic level is still a difficult task for the average person. Under the covers, the scientific and statistical concepts can quickly become overwhelming. This creates a “steeper than average” learning curve. But for the exam, the only way out is through. So if you want this certification, prepare to get uncomfortable!

The study track outlined below took me about 3 months to complete. I started with a minimal time investment and gradually increased over that timeframe. My curriculum evolved as my studying progressed.

PRIOR EXPERIENCE

Google Cloud

I’ve been working with GCP for about 2.5 years in a Professional Services setting, helping clients of all sizes and industries migrate, build, and run solutions across many aspects of the platform. My current position is a Director of Project Management in the GCP Practice at EPAM. I also develop several personal projects on GCP, including this website.

Machine Learning

Over the course of my 15 year professional career, I’ve implemented many production IT solutions that leverage AI and ML. Most of this has been in a contact center setting, deploying voice and chat bots with Conversational AI technologies (Speech to Text, Text to Speech, and Natural Language). I also have experience with Dialogflow solutions. But I’m not an ML researcher, and I’m not a TensorFlow practitioner.

THE EXAM

Google publishes an overview, detailed outline, and sample questions on this page. I went to an onsite-proctored facility, because I’ve found that the turnaround time for the formal review of the results by Google is faster, and there’s a testing center within 15 minutes of my home.

The exam itself is similar to other Google Cloud certifications. The only notable highlights are that there weren’t customer case studies that span multiple questions (each question presents a unique hypothetical situation), and this exam had 60 questions, while the rest have only had 50 questions.

The exam tests ML solution expertise in the context of other cloud skills. Specifically: production grade cloud solution architecture, data engineering concepts, and MLOps best practices. Therefore, I’d strongly recommend a baseline of the Cloud Architect, Data Engineer, and Cloud DevOps Engineer certifications before thinking about this one. It’s certainly not a hard requirement, but would significantly reduce the cognitive load on exam day. Since I have these certifications, my study track includes minimal preparation in these areas.

KEY EXAM CONCEPTS

The exam covers a lot of ground, and the most direct hints to the scope of the questions are in the official exam guide outline. That said, the outline can be ambiguous. Based on my personal experience, I recommend viewing the exam as three themes to be mastered.

ML Problem Framing and the ML Journey

Put technology aside for now, and start with a 30,000 foot view of how to get from “I think I have a problem that ML might be able to solve” to a fully operationalized solution:

Adaptation of: Machine Learning Workflow

The methodology to build and deploy an ML solution is different than a traditional IT project. The process of going from use case to trained model is iterative. There may be non-starters along the way that require a reboot. Examples: the use case might not be a fit for ML, or the required data to train a model isn’t being captured/is of bad quality/isn’t at the volume necessary for ML, or maybe the use case and data check out but the trained model performs poorly.

The major activities within the methodology are highlighted above:
– Problem Framing and Use Case Discovery: Defining what the business problem is, if ML can (or should) solve it, what business value the ML solution would provide, and if the necessary data exists (at a directional level).
– Data Analysis and Feasibility Study: Performing a rigorous exploration of the available raw data to validate that it’s sufficient for ML, identifying the correct model architecture for the problem, and verifying that the fundamental prerequisites exist to proceed with model experimentation.
– Feature Engineering and Model Development: Transforming the raw data into the feature inputs (in a repeatable way), then training and evaluating models until one with acceptable performance is found.
– Production Deployment and Operations: Planning for and executing the deployment of the ML model into a modern cloud production environment. This feels more like a traditional IT project, though there are still unique considerations for ML and cloud.

The Google Cloud AI and ML Product Portfolio

As of November 2020, the ecosystem of AI and ML products and services that are available in Google Cloud is below:

This diagram will be out of date in 3…2…1… AI Platform (Unified)

The application of these products to the ML journey is where the rubber meets the road, from a technical standpoint.

The most important thing to understand about the GCP portfolio is that there are tiers of products that offer different levels of abstraction in exchange for ease of implementation and feature guardrails:
– AI Platform enables full lifecycle support for custom ML models in a variety of frameworks (though the limitations need to be understood; TensorFlow is a first-class citizen compared to the rest).
– AutoML supports custom model development for a limited set of use cases in a codeless manner, leveraging transfer learning and neural architecture search under the covers.
– BigQuery ML supports custom model development for a variety of use cases involving structured data directly in BigQuery, through SQL. This isn’t as hands-off as AutoML, but is much less complex than AI Platform.
– Pre-Trained Models are consumed as-is through a RESTful API interface, with no customization and low implementation complexity.
– Solutions like Contact Center AI and Document AI solve industry-specific problems with the underlying ML products and services.

The ML solution to a business use case might not require custom model development. It might simply be an application that leverages pre-trained model API calls.

Operationalizing an ML Solution on GCP (AKA MLOps)

Once a viable ML solution exists, it still needs to transition from a research (or POC, or MVM) stage to a productionized state. This can be a significant undertaking, particularly in the context of custom model development where there are many other moving parts:

Adaptation of: MLOps: Continuous delivery and automation pipelines in machine learning

Based on Google’s extensive research and experience in this domain, it’s estimated that the code for the ML model itself is ~5% of the overall solution; the other ~95% that needs to be solved for are the aspects that enable composability, portability, and scalability. The answer to the ~95% is MLOps, and a simple analogy for MLOps is that it’s the application of DevOps principles in the context of an ML solution. Also, TFX and Kubeflow pipelines!

Make no mistake: these three themes are a cursory overview of the exam content, and drilling down into the weeds of each is the hard part. But knowing the lifecycle of a use case’s ML journey, having a command of the relevant GCP product landscape, and understanding the path to operationalize an ML solution – these are practical skills for the real world that will lend themselves to solving hypothetical problems in the exam.

STUDY TRACK

I apply the same approach in preparing for every Google Cloud certification:
– I start by assuming that I know nothing, regardless of prior experience. The exam is through Google’s perspective, not mine.
– I work towards an objective of acquiring expertise that I can apply in my day to day, not just obtaining the certification.
– I acquire the expertise by educating myself in several buckets of resources (outlined below) that progress from theoretical knowledge to practical solutions. Each bucket serves a purpose, and collectively they form a well-rounded perspective.

Pretty boring stuff – there are no shortcuts here, just persistence! 🙂

Everyone learns differently, so what works well for me might not be the best approach for you. This is the final distillation of what I felt was the most relevant, and what I’d recommend to someone who’s starting out with their own exam prep.

Learn From Other People’s Mistakes

… or their successes. A great starting point for putting together a personalized study track is to read the insights that have been shared by others who have already taken the exam. To put it into ML terms, think of it as a form of ensemble learning. This article is my contribution to the ensemble!

My mini-ensemble consisted of these three articles:
1. Google Cloud Professional Machine Learning Engineer Certification Preparation Guide (Dmitri Lerko and Steven MacManus)
2. Tips to nail the Google Cloud Certified Machine Learning Engineer Professional exam (Konrad Clapa)
3. 20 Days to Google Cloud Professional Machine Learning Engineer Exam (Han Qi)

Links to others are aggregated in the Awesome GCP Certifications repository (Satish VJ)

Theoretical Crash Course

Unless you’ve worked for Google as an ML professional, you most likely need to round out your theoretical knowledge.

One pass through this material was sufficient for me:
– An Executive’s Guide to AI (McKinsey & Company)
– Machine Learning Crash Course (Google) – all 7 modules, from the Crash Course to GANs
– Text Classification (Google)
– Good Data Analysis (Google)
– AI Principles (Google)
– Responsible AI Practices (Google)
– CS231n: Convolutional Neural Networks for Visual Recognition Course Notes (Stanford)

The Rules of Machine Learning (Martin Zinkevich) and Machine Learning Glossary (Google) are two additional resources that I found myself revisiting for some ground truth on terminology and best practices.

Online Courses and Self-Paced Labs

It’s always a good idea to follow Google’s suggested learning path. This path is a combination of Coursera (videos, interactive labs, and quizzes) and Qwiklabs (purely hands-on labs). Personal feedback on this – the Big Data & Machine Learning Fundamentals course is very basic. I skipped it, and think anyone who would need it might be too inexperienced with GCP to sit for this exam.

Machine Learning for Business Professionals (Coursera) is an excellent course, one that I think is a more appropriate starting point than the Big Data & Machine Learning Fundamentals course. It rounds out what is mostly a technical curriculum with a business leadership perspective on ML. Hey Google, please make this change to your learning path!

MLOps (Machine Learning Operations) Fundamentals (Pluralsight) provides a bit more depth on the MLOps concepts. I’m surprised that there isn’t an equivalent course on Coursera, since this is authored by Google. Highly recommend it.

OSS Frameworks

There’s a robust OSS ecosystem in the data science and ML world that anchors on Python as a programming language. Knowing this language is table stakes.

The frameworks that are most relevant are:
– TensorFlow: Not only relevant, but required. All of it. The core features, TFX, TensorBoard, the Embedding Projector, and how to use and interpret the Playground.
– Keras: TensorFlow 2.x implements the Keras API, but this is still worth a read independently to understand the evolution of these two frameworks.
– Kubeflow: The core workflow, Kubeflow Pipelines, Katib, and Seldon Core.
– A 101 level awareness of other ML frameworks like PyTorch, XGBoost, and SciKit-Learn is necessary, even though the focus of this exam is on TensorFlow.
– A 101 level awareness of popular data analysis frameworks like NumPy, pandas, and seaborn is helpful.

Google Cloud Products

Possibly the most tedious part of exam prep, but the most high-value on exam day and beyond: identifying the superset of Google Cloud products in the space, and reading the documentation. This is the only way to understand the product features and limitations at a level of granularity that matches the exam questions (and the real world). Previous hands-on experience with a given product might make this exercise unnecessary.

The Google Cloud AI and ML product ecosystem is not insignificant. Here’s an outline of what I reviewed:
– AI Platform core services (Notebooks and Deep Learning Images, Data Labeling, Training, Prediction and Continuous Evaluation, Vizier Optimizer, Pipelines) and infrastructure (GPUs and TPUs)
– AutoML supported use cases
– AI Building Blocks (the many pre-trained models that are available as REST APIs)
– BigQuery ML supported use cases
– Contact Center AI use cases with Dialogflow
– Everything else under the AI and Machine Learning products landing page
– Data & Analytics products that are leveraged to ingest and prepare the data in an ML solution (BigQuery, Dataproc, Dataflow, Pub/Sub, Data Fusion, Cloud Composer, Data Studio, Dataprep)
– Firebase Machine Learning and ML Kit are also worth a review, in the context of understanding the use cases and mechanics for ML on mobile and edge devices

Google Cloud Whitepapers & Solutions

There’s an infinite supply of Google-authored, solution-centric documentation publicly available on cloud.google.com. These articles and whitepapers are filled with best practices from the field across the implementation lifecycle of machine learning solutions. It can be difficult to find the useful ones though, since there’s so much content out there.

The unabridged list of what I reviewed:
– AI Adoption Framework (whitepaper)
– Is My Data Any Good? A Pre-ML Checklist (whitepaper)
– Exploratory Data Analysis for Feature Selection in Machine Learning (whitepaper)
– AI Explanations (whitepaper)
– Inclusive ML
– Data Lifecycle on Google Cloud Platform (solution)
– Analyzing and validating data at scale for machine learning with TensorFlow Data Validation (solution)
– Machine Learning with Structured Data: Part 1, Part 2, Part 3 (solution)
– Data Preprocessing for Machine Learning: Part 1, Part 2 (solution)
– MLOps: Continuous delivery and automation pipelines in machine learning (solution)
– Setting up an MLOps environment on Google Cloud (solution)
– Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build (solution)
– Considerations for Sensitive Data within Machine Learning Datasets (solution)
– Getting Started With Kubeflow Pipelines (blog)
– Building a Serverless Machine Learning Model (solution)
– Minimizing real-time prediction serving latency in machine learning (solution)
– Best practices for performance and cost optimization for machine learning (solution)

Google has authored research papers that will provide a greater intuition on some of the underlying concepts of their products. Consider these to be extra credit:
– Machine Learning: The High Interest Credit Card of Technical Debt
– A Rubric for ML Production Readiness and Technical Debt Reduction
– Neural Architecture Search with Reinforcement Learning
– Google Vizier: A Service for Black-Box Optimization

There are also countless high-quality, solution-centric articles on external sites like medium.com and towardsdatascience.com, which create an even larger search space. I gravitate to Google-authored content, because the exam will be through Google’s view of the world.

Hands-On Project Experience

And finally, there’s no substitute for hands-on experience, regardless of whether it’s obtained professionally (if you have exposure to these technologies and solutions) or through personal experimentation. Colab is a great platform to get started with for free.

FINAL THOUGHTS

The Professional Machine Learning Engineer exam is a rigorous test that requires a serious investment to pass with confidence. What ended up working for me was to mindmap new concepts back to my previous experience with ML and AI solutions. Having a real-world use case to anchor on strengthened the learning process. I still felt like I could have known a few things better on exam day, but overall took a massive leap forward in proficiency.

The future trend is to Stop Experimenting With Machine Learning And Start Actually Using It. This is apparent in the Google Cloud product landscape, as services like AutoML and AI building blocks reduce or even eliminate the effort to train a model, and AI Platform continues to simplify the experience where custom end-to-end ML development is still necessary. Google is doing their part to abstract away complexity.

This exam is very much aligned with the trend, placing an emphasis not just on the theory of ML, but in its ability to deliver outcomes in the context of business and technical constraints – all with an eye on a disciplined, repeatable approach and long-term resilience.

The application of ML and AI is still in its infancy, and I’d consider this certification an essential for professionals that are interested in participating at the forefront of this industry in the Google Cloud ecosystem.

Thanks for reading and good luck!