Domain 4: Modernize Infrastructure and Applications with Google Cloud (~17%)
Domain 4 of the Google Cloud Digital Leader exam covers how organizations modernize infrastructure and applications using Google Cloud services. At approximately 17% of the exam, this domain accounts for roughly 9-10 questions. It spans six topic areas: migration strategies, compute options, serverless computing, containers, APIs, and hybrid/multi-cloud.
The exam tests your ability to select the right compute model for a given workload, explain the business rationale behind migration and modernization strategies, and understand when to use containers versus VMs versus serverless. This is not a deep-dive engineering domain -- it tests conceptual understanding and decision-making.
1. Cloud Modernization and Migration
The 6 Rs of Migration
Every migration question on the exam maps to one of these strategies. Memorize them and their trade-offs:
| Strategy | Also Known As | What Happens | When to Use | Effort Level |
|---|---|---|---|---|
| Retire | Decommission | Shut down the application entirely | Application is no longer needed or used | None |
| Retain | Keep on-premises | Do not migrate; keep running where it is | Compliance requirements, recent hardware investment, not worth migrating | None |
| Rehost | Lift and shift | Move to cloud VMs with minimal or no code changes | Legacy applications, tightly-coupled systems, need fastest path to cloud | Low |
| Replatform | Lift and optimize / Move and improve | Migrate with some optimization (e.g., swap to managed database) | Applications that benefit from cloud services without full rewrite | Medium |
| Refactor | Move and improve | Modify application architecture to leverage cloud-native features | Applications worth investing in for long-term cloud benefits | High |
| Repurchase | Drop and shop / Replace | Switch to a commercial SaaS product (e.g., replace custom CRM with Salesforce, move email to Google Workspace) | Applications where a SaaS alternative exists and is more cost-effective than maintaining custom code | Medium |
(Google Cloud Migration Guide)
Exam trap: The exam uses the terms interchangeably. "Lift and shift" is always Rehost. "Move and improve" is Replatform. If a question describes moving a VM image directly to Compute Engine with no code changes, that is Rehost -- even if they do not use the word.
Exam trap: Replatform is NOT Refactor. Replatform means making targeted improvements during migration (e.g., switching from a self-managed MySQL to Cloud SQL). Refactor means redesigning the application architecture itself (e.g., breaking a monolith into microservices).
Four Phases of Migration
Google Cloud defines a structured migration framework:
| Phase | Purpose | Key Activities |
|---|---|---|
| Assess | Understand what you have | Inventory applications, identify dependencies, calculate total cost of ownership (TCO) |
| Plan | Design the target environment | Select migration strategies per workload, design cloud infrastructure, establish networking |
| Deploy | Execute the migration | Migrate workloads, validate functionality, refine processes |
| Optimize | Maximize cloud value | Tune performance, reduce costs, adopt cloud-native features |
(Google Cloud Migration Guide)
Google Cloud Migration Tools
| Tool | Purpose |
|---|---|
| Migration Center | Unified platform for end-to-end migration planning and assessment |
| Migrate to Virtual Machines | Migrate physical servers and VMs to Compute Engine |
| Database Migration Service | Migrate databases to Cloud SQL, AlloyDB, or other managed services |
| Storage Transfer Service | Move data from other cloud providers or on-premises storage |
| Transfer Appliance | Physical hardware appliance for transferring hundreds of terabytes to 1 petabyte of data |
| BigQuery Migration Service | Migrate data warehouse workloads to BigQuery |
(Google Cloud Migration Guide)
2. Computing in the Cloud
Compute Engine (IaaS)
Compute Engine provides virtual machines running on Google's infrastructure. It is the Infrastructure-as-a-Service (IaaS) offering -- you get full control over the OS, networking, and installed software. (Compute Engine Docs)
Machine type families (know what each is optimized for):
| Family | Optimized For | Example Use Cases |
|---|---|---|
| General-purpose (E2, N2, N2D, N1, C3) | Balanced CPU and memory | Web servers, application servers, small databases, development |
| Compute-optimized (C2, C3, H3) | High per-core CPU performance | Batch processing, gaming, high-performance computing (HPC) |
| Memory-optimized (M2, M3) | High memory-to-CPU ratio | In-memory databases (SAP HANA, Redis), real-time analytics |
| Accelerator-optimized (A2, A3, G2) | GPU/TPU workloads | Machine learning training/inference, video transcoding, scientific simulation |
Cost optimization options:
| Option | Discount | Key Constraint |
|---|---|---|
| Sustained use discounts | Up to 30% automatically | No commitment -- applied automatically when a VM runs >25% of a month |
| Committed use discounts (CUDs) | Up to 55% (general); up to 70% (memory-optimized) | Requires 1-year or 3-year commitment for specific vCPU and memory amounts |
| Spot VMs (formerly Preemptible VMs) | Up to 60-91% | Google can reclaim them at any time with 30 seconds notice; no SLA |
Exam trap: Spot VMs are NOT suitable for workloads that cannot tolerate interruption. They are ideal for batch processing, CI/CD, fault-tolerant jobs, and data analysis. If a question describes a critical production database, Spot VMs are the wrong answer.
Sole-tenant nodes: Dedicated physical servers where only your VMs run. Used for compliance requirements, licensing constraints (bring-your-own-license), or workloads that require physical isolation from other tenants. More expensive than standard VMs.
Live migration: Google transparently moves running VMs to different physical hosts during maintenance events -- no reboot, no downtime. This is a key differentiator versus other cloud providers. Spot VMs do not support live migration; they are terminated instead.
Autoscaling and Load Balancing
Managed instance groups (MIGs) contain identical VM instances and support:
- Autoscaling: Automatically adds or removes VM instances based on CPU utilization, load balancing capacity, custom metrics, or schedules
- Autohealing: Replaces unhealthy instances based on health check results
- Rolling updates: Deploy new versions across the group with configurable surge and disruption limits
Cloud Load Balancing distributes traffic across instances, regions, or backends. Key types:
| Type | Layer | Scope | Use Case |
|---|---|---|---|
| HTTP(S) Load Balancing | Layer 7 | Global | Web applications, content-based routing |
| TCP/SSL Proxy | Layer 4 | Global | Non-HTTP TCP traffic requiring global distribution |
| Network Load Balancing | Layer 4 | Regional | High-performance, low-latency regional traffic |
| Internal Load Balancing | Layer 4/7 | Regional | Traffic between internal services (not internet-facing) |
Exam trap: HTTP(S) Load Balancing is global -- a single anycast IP routes users to the nearest healthy backend. Network Load Balancing is regional. If a question requires distributing web traffic across multiple regions, the answer is HTTP(S) Load Balancing.
3. Serverless Computing
Serverless means Google manages all infrastructure. You deploy code or containers; Google handles provisioning, scaling, patching, and availability. The exam tests three serverless products and when to choose each.
Cloud Run
Cloud Run is a fully managed platform for deploying containerized applications, functions, or source code. It automatically scales (including to zero), charges only for resources consumed during request processing, and requires no cluster management. (Cloud Run Docs)
Key characteristics:
- Accepts any language or binary packaged as a container image
- Supports source-based deployment for Go, Node.js, Python, Java, .NET, Ruby (auto-builds container)
- Provides HTTPS endpoints with automatic TLS certificates
- Supports WebSockets, HTTP/2, and gRPC end-to-end
- Scales to zero when idle (no cost); scales up automatically under load
- Two billing models: request-based (pay per request) and instance-based (pay per instance lifetime)
Three resource types:
| Type | Purpose | Scaling |
|---|---|---|
| Services | Handle HTTP requests at unique HTTPS endpoints | Auto-scales (including to zero) |
| Jobs | Run tasks to completion (batch processing) | Parallelizable across instances |
| Worker Pools | Pull-based workloads (Kafka, Pub/Sub consumers) | Manual scaling; no public endpoint |
App Engine (PaaS)
App Engine is Google's original Platform-as-a-Service for web and mobile backends. It comes in two environments:
| Feature | Standard Environment | Flexible Environment |
|---|---|---|
| Startup time | Seconds | Minutes |
| Scale to zero | Yes | No (minimum 1 instance) |
| Custom runtimes | No | Yes (via Dockerfile) |
| WebSocket support | No | Yes |
| Background processes | No | Yes |
| SSH debugging | No | Yes |
| Pricing basis | Instance hours | vCPU, memory, disk |
| Best for | Spiky traffic, low-cost apps | Steady traffic, custom dependencies |
Important: Google officially recommends Cloud Run over App Engine for new projects. App Engine questions on the exam typically test knowledge of its two environments and their trade-offs, not as the preferred choice for new workloads.
Cloud Functions (FaaS)
Cloud Functions (now called Cloud Run functions) is Google's Function-as-a-Service offering for small, event-driven code. You write a single function; Google executes it in response to events.
Key characteristics:
- Single-purpose functions triggered by events (HTTP requests, Pub/Sub messages, Cloud Storage changes, Firestore updates)
- Automatic scaling per invocation
- Pay only for execution time (billed per 100ms)
- Supported runtimes: Node.js, Python, Go, Java, .NET, Ruby, PHP
- 2nd gen (current) is built on Cloud Run infrastructure
Exam trap: Cloud Functions 2nd gen is built on Cloud Run under the hood. Google is converging these products. For the exam, Cloud Functions is the answer when the question describes a simple, single-purpose, event-triggered function. Cloud Run is the answer for containerized applications or services with multiple endpoints.
Choosing the Right Serverless Product
| Question | Answer |
|---|---|
| Need to run a container with multiple endpoints? | Cloud Run |
| Need a simple function triggered by an event? | Cloud Functions |
| Need a managed web app platform with no containers? | App Engine Standard |
| Need custom runtime or background processing? | App Engine Flexible or Cloud Run |
| Need to scale to zero? | Cloud Run or App Engine Standard |
| Need WebSocket support? | Cloud Run or App Engine Flexible |
4. Containers in the Cloud
Containers vs. Virtual Machines
This comparison is heavily tested. Know it cold:
| Aspect | Virtual Machines | Containers |
|---|---|---|
| Abstraction level | Full hardware virtualization with guest OS | OS-level virtualization sharing host kernel |
| Size | Gigabytes (includes full OS) | Megabytes (application + dependencies only) |
| Startup time | Minutes | Seconds |
| Resource overhead | High (each VM runs its own OS) | Low (shared kernel, no guest OS) |
| Isolation | Strong (separate OS per VM) | Process-level (shared kernel) |
| Portability | Limited (tied to hypervisor) | High (runs anywhere with container runtime) |
| Density | Fewer per host | Many more per host |
| Use case | Legacy apps, full OS control, strong isolation | Microservices, modern apps, rapid deployment |
Exam trap: Containers are NOT always better than VMs. VMs provide stronger isolation (critical for multi-tenant security), support any OS (Windows, Linux), and are necessary for legacy applications that cannot be containerized. The exam tests whether you know when VMs are the right choice.
Microservices Architecture
Microservices decompose a monolithic application into small, independently deployable services, each responsible for a specific business function.
Benefits:
- Independent scaling: Scale only the services that need it, not the entire application
- Independent deployment: Update one service without redeploying everything
- Technology flexibility: Each service can use a different language, framework, or database
- Fault isolation: A failure in one service does not crash the entire application
- Team autonomy: Small teams own individual services end-to-end
Challenges:
- Network complexity and latency between services
- Distributed system debugging is harder
- Data consistency across services requires careful design
- Operational overhead (monitoring, logging, tracing across many services)
Relationship to containers: Containers are the natural deployment unit for microservices. Each microservice is packaged as a container image, deployed independently, and scaled individually. Kubernetes orchestrates the lifecycle of these containers.
Google Kubernetes Engine (GKE)
GKE is Google's managed Kubernetes service. Google manages the control plane (API server, scheduler, etcd); you manage the workloads. (GKE Docs)
Two operating modes:
| Feature | Autopilot (Recommended) | Standard |
|---|---|---|
| Node management | Google manages nodes | You manage node pools |
| Pricing | Pay per pod resource request | Pay per node (VM) regardless of utilization |
| Security hardening | Built-in, automatic | Manual configuration required |
| Configuration | Opinionated defaults | Full customization |
| Best for | Most workloads; production-ready with minimal ops | Workloads requiring specific node configurations |
Key concepts for the exam:
- Node pools: Groups of nodes (VMs) with identical configuration within a cluster
- Cluster autoscaler: Automatically adjusts the number of nodes based on pod scheduling demands
- Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on CPU, memory, or custom metrics
- Spot Pods: Run workloads on Spot VMs within GKE for significant cost savings on fault-tolerant jobs
Exam trap: GKE Autopilot is NOT serverless in the traditional sense. You still work with Kubernetes concepts (pods, deployments, services). The "managed" part means Google handles node provisioning, scaling, and security. Cloud Run is the serverless container option where you do not interact with Kubernetes at all.
5. The Value of APIs
What APIs Are
An Application Programming Interface (API) is a standardized contract that defines how software components communicate. APIs expose specific capabilities of a service while hiding internal implementation details.
Business value of APIs:
| Value | Description |
|---|---|
| New revenue streams | Monetize APIs by charging developers or partners for access |
| Ecosystem creation | Enable third-party developers to build on your platform |
| Partner integration | Standardized integration reduces custom development costs |
| Innovation acceleration | Internal teams and external partners build new products faster |
| Data monetization | Expose data services securely to paying customers |
Apigee API Management
Apigee is Google Cloud's API management platform. It provides full lifecycle management for APIs -- design, secure, deploy, monitor, and monetize.
Core capabilities:
| Capability | What It Does |
|---|---|
| API Gateway | Proxies API requests, enforces policies (rate limiting, quotas, authentication) |
| Developer Portal | Self-service portal where developers discover, register for, and test APIs |
| Analytics | Traffic analysis, error tracking, latency monitoring, developer engagement metrics |
| Monetization | Billing and revenue sharing for API usage (pay-per-call, tiered pricing, freemium models) |
| Security | OAuth, API keys, JWT validation, threat protection (SQL injection, XSS) |
| Version management | Manage multiple API versions and deprecation lifecycles |
Exam trap: Apigee is the answer when the question mentions API monetization, developer portals, or API lifecycle management. Do not confuse it with Cloud Endpoints (simpler API gateway) or API Gateway (lightweight, serverless-focused).
6. Hybrid and Multi-Cloud
Hybrid Cloud vs. Multi-Cloud
| Strategy | Definition | Business Drivers |
|---|---|---|
| Hybrid cloud | Combination of on-premises (or private cloud) and public cloud | Phased migration, data residency/compliance requirements, existing on-premises investments, latency-sensitive edge workloads |
| Multi-cloud | Using services from two or more public cloud providers | Avoid vendor lock-in, leverage best-of-breed services, redundancy across providers, regulatory requirements |
Google Distributed Cloud (formerly Anthos)
Google Distributed Cloud is Google's platform for managing workloads consistently across on-premises data centers, edge locations, and multiple public clouds. It extends Google Cloud services and the GKE management model beyond Google's own infrastructure.
Key capabilities:
| Capability | Description |
|---|---|
| Consistent management | Same tools, policies, and APIs across all environments |
| GKE everywhere | Run GKE clusters on-premises, on AWS, on Azure, or at the edge |
| Config Management | Policy-as-code and GitOps-based configuration management across all clusters |
| Service Mesh | Traffic management, observability, and security for microservices across environments |
| Serverless on-premises | Run Cloud Run workloads on your own infrastructure |
When the exam says "Anthos": The exam may still reference "Anthos" by name. Anthos was rebranded to Google Distributed Cloud, but the functionality is the same. If a question asks about managing Kubernetes clusters across on-premises and multiple cloud providers from a single control plane, the answer is Anthos / Google Distributed Cloud.
Edge Computing
Edge computing processes data closer to where it is generated rather than sending everything to a centralized cloud data center. Google Distributed Cloud supports edge deployments for scenarios requiring:
- Ultra-low latency (manufacturing, retail, telecommunications)
- Data locality (data must stay in a specific physical location)
- Intermittent connectivity (remote or disconnected sites)
Quick-Reference: Compute Decision Tree
Use this to answer "which service should you use" questions:
Does the workload require full OS-level control?
YES --> Compute Engine (VMs)
NO --> Is it a container-based workload?
YES --> Do you need Kubernetes orchestration?
YES --> GKE (Autopilot for most; Standard for custom needs)
NO --> Cloud Run (serverless containers)
NO --> Is it a simple event-driven function?
YES --> Cloud Functions
NO --> Is it a web/mobile app?
YES --> App Engine (Standard for spiky traffic; Flexible for custom runtimes)
NO --> Evaluate Compute Engine or Cloud Run based on requirements
Exam Tips for Domain 4
- Migration strategy questions almost always describe a scenario and ask which R applies. Focus on the amount of change: no change = Rehost, some optimization = Replatform, architecture redesign = Refactor.
- Compute choice questions test trade-offs between control and management burden. More control = more management = Compute Engine. Less management = less control = Cloud Run or Cloud Functions.
- Container vs. VM questions test whether you understand that containers share a kernel (lighter, faster, less isolated) while VMs each have their own OS (heavier, slower, stronger isolation).
- GKE Autopilot vs. Standard -- Autopilot is the recommended default. Standard is for edge cases requiring custom node configuration.
- Serverless questions test whether you pick Cloud Run (containers, multiple endpoints), Cloud Functions (single event-driven function), or App Engine (managed web platform).
- Anthos/Distributed Cloud is always the answer for hybrid or multi-cloud Kubernetes management.
- Apigee is always the answer for API monetization or full API lifecycle management.
- Spot VMs are always wrong for workloads that cannot tolerate interruption.