Reference

Domain 4: Modernize Infrastructure and Applications with Google Cloud (~17%)

Domain 4 of the Google Cloud Digital Leader exam covers how organizations modernize infrastructure and applications using Google Cloud services. At approximately 17% of the exam, this domain accounts for roughly 9-10 questions. It spans six topic areas: migration strategies, compute options, serverless computing, containers, APIs, and hybrid/multi-cloud.

The exam tests your ability to select the right compute model for a given workload, explain the business rationale behind migration and modernization strategies, and understand when to use containers versus VMs versus serverless. This is not a deep-dive engineering domain -- it tests conceptual understanding and decision-making.

1. Cloud Modernization and Migration

The 6 Rs of Migration

Every migration question on the exam maps to one of these strategies. Memorize them and their trade-offs:

Strategy	Also Known As	What Happens	When to Use	Effort Level
Retire	Decommission	Shut down the application entirely	Application is no longer needed or used	None
Retain	Keep on-premises	Do not migrate; keep running where it is	Compliance requirements, recent hardware investment, not worth migrating	None
Rehost	Lift and shift	Move to cloud VMs with minimal or no code changes	Legacy applications, tightly-coupled systems, need fastest path to cloud	Low
Replatform	Lift and optimize / Move and improve	Migrate with some optimization (e.g., swap to managed database)	Applications that benefit from cloud services without full rewrite	Medium
Refactor	Move and improve	Modify application architecture to leverage cloud-native features	Applications worth investing in for long-term cloud benefits	High
Repurchase	Drop and shop / Replace	Switch to a commercial SaaS product (e.g., replace custom CRM with Salesforce, move email to Google Workspace)	Applications where a SaaS alternative exists and is more cost-effective than maintaining custom code	Medium

(Google Cloud Migration Guide)

Exam trap: The exam uses the terms interchangeably. "Lift and shift" is always Rehost. "Move and improve" is Replatform. If a question describes moving a VM image directly to Compute Engine with no code changes, that is Rehost -- even if they do not use the word.

Exam trap: Replatform is NOT Refactor. Replatform means making targeted improvements during migration (e.g., switching from a self-managed MySQL to Cloud SQL). Refactor means redesigning the application architecture itself (e.g., breaking a monolith into microservices).

Four Phases of Migration

Google Cloud defines a structured migration framework:

Phase	Purpose	Key Activities
Assess	Understand what you have	Inventory applications, identify dependencies, calculate total cost of ownership (TCO)
Plan	Design the target environment	Select migration strategies per workload, design cloud infrastructure, establish networking
Deploy	Execute the migration	Migrate workloads, validate functionality, refine processes
Optimize	Maximize cloud value	Tune performance, reduce costs, adopt cloud-native features

(Google Cloud Migration Guide)

Google Cloud Migration Tools

Tool	Purpose
Migration Center	Unified platform for end-to-end migration planning and assessment
Migrate to Virtual Machines	Migrate physical servers and VMs to Compute Engine
Database Migration Service	Migrate databases to Cloud SQL, AlloyDB, or other managed services
Storage Transfer Service	Move data from other cloud providers or on-premises storage
Transfer Appliance	Physical hardware appliance for transferring hundreds of terabytes to 1 petabyte of data
BigQuery Migration Service	Migrate data warehouse workloads to BigQuery

(Google Cloud Migration Guide)

2. Computing in the Cloud

Compute Engine (IaaS)

Compute Engine provides virtual machines running on Google's infrastructure. It is the Infrastructure-as-a-Service (IaaS) offering -- you get full control over the OS, networking, and installed software. (Compute Engine Docs)

Machine type families (know what each is optimized for):

Family	Optimized For	Example Use Cases
General-purpose (E2, N2, N2D, N1, C3)	Balanced CPU and memory	Web servers, application servers, small databases, development
Compute-optimized (C2, C3, H3)	High per-core CPU performance	Batch processing, gaming, high-performance computing (HPC)
Memory-optimized (M2, M3)	High memory-to-CPU ratio	In-memory databases (SAP HANA, Redis), real-time analytics
Accelerator-optimized (A2, A3, G2)	GPU/TPU workloads	Machine learning training/inference, video transcoding, scientific simulation

Cost optimization options:

Option	Discount	Key Constraint
Sustained use discounts	Up to 30% automatically	No commitment -- applied automatically when a VM runs >25% of a month
Committed use discounts (CUDs)	Up to 55% (general); up to 70% (memory-optimized)	Requires 1-year or 3-year commitment for specific vCPU and memory amounts
Spot VMs (formerly Preemptible VMs)	Up to 60-91%	Google can reclaim them at any time with 30 seconds notice; no SLA

Exam trap: Spot VMs are NOT suitable for workloads that cannot tolerate interruption. They are ideal for batch processing, CI/CD, fault-tolerant jobs, and data analysis. If a question describes a critical production database, Spot VMs are the wrong answer.

Sole-tenant nodes: Dedicated physical servers where only your VMs run. Used for compliance requirements, licensing constraints (bring-your-own-license), or workloads that require physical isolation from other tenants. More expensive than standard VMs.

Live migration: Google transparently moves running VMs to different physical hosts during maintenance events -- no reboot, no downtime. This is a key differentiator versus other cloud providers. Spot VMs do not support live migration; they are terminated instead.

Autoscaling and Load Balancing

Managed instance groups (MIGs) contain identical VM instances and support:

Autoscaling: Automatically adds or removes VM instances based on CPU utilization, load balancing capacity, custom metrics, or schedules
Autohealing: Replaces unhealthy instances based on health check results
Rolling updates: Deploy new versions across the group with configurable surge and disruption limits

Cloud Load Balancing distributes traffic across instances, regions, or backends. Key types:

Type	Layer	Scope	Use Case
HTTP(S) Load Balancing	Layer 7	Global	Web applications, content-based routing
TCP/SSL Proxy	Layer 4	Global	Non-HTTP TCP traffic requiring global distribution
Network Load Balancing	Layer 4	Regional	High-performance, low-latency regional traffic
Internal Load Balancing	Layer 4/7	Regional	Traffic between internal services (not internet-facing)

Exam trap: HTTP(S) Load Balancing is global -- a single anycast IP routes users to the nearest healthy backend. Network Load Balancing is regional. If a question requires distributing web traffic across multiple regions, the answer is HTTP(S) Load Balancing.

3. Serverless Computing

Serverless means Google manages all infrastructure. You deploy code or containers; Google handles provisioning, scaling, patching, and availability. The exam tests three serverless products and when to choose each.

Cloud Run

Cloud Run is a fully managed platform for deploying containerized applications, functions, or source code. It automatically scales (including to zero), charges only for resources consumed during request processing, and requires no cluster management. (Cloud Run Docs)

Key characteristics:

Accepts any language or binary packaged as a container image
Supports source-based deployment for Go, Node.js, Python, Java, .NET, Ruby (auto-builds container)
Provides HTTPS endpoints with automatic TLS certificates
Supports WebSockets, HTTP/2, and gRPC end-to-end
Scales to zero when idle (no cost); scales up automatically under load
Two billing models: request-based (pay per request) and instance-based (pay per instance lifetime)

Three resource types:

Type	Purpose	Scaling
Services	Handle HTTP requests at unique HTTPS endpoints	Auto-scales (including to zero)
Jobs	Run tasks to completion (batch processing)	Parallelizable across instances
Worker Pools	Pull-based workloads (Kafka, Pub/Sub consumers)	Manual scaling; no public endpoint

App Engine (PaaS)

App Engine is Google's original Platform-as-a-Service for web and mobile backends. It comes in two environments:

Feature	Standard Environment	Flexible Environment
Startup time	Seconds	Minutes
Scale to zero	Yes	No (minimum 1 instance)
Custom runtimes	No	Yes (via Dockerfile)
WebSocket support	No	Yes
Background processes	No	Yes
SSH debugging	No	Yes
Pricing basis	Instance hours	vCPU, memory, disk
Best for	Spiky traffic, low-cost apps	Steady traffic, custom dependencies

(App Engine Environments)

Important: Google officially recommends Cloud Run over App Engine for new projects. App Engine questions on the exam typically test knowledge of its two environments and their trade-offs, not as the preferred choice for new workloads.

Cloud Functions (FaaS)

Cloud Functions (now called Cloud Run functions) is Google's Function-as-a-Service offering for small, event-driven code. You write a single function; Google executes it in response to events.

Key characteristics:

Single-purpose functions triggered by events (HTTP requests, Pub/Sub messages, Cloud Storage changes, Firestore updates)
Automatic scaling per invocation
Pay only for execution time (billed per 100ms)
Supported runtimes: Node.js, Python, Go, Java, .NET, Ruby, PHP
2nd gen (current) is built on Cloud Run infrastructure

Exam trap: Cloud Functions 2nd gen is built on Cloud Run under the hood. Google is converging these products. For the exam, Cloud Functions is the answer when the question describes a simple, single-purpose, event-triggered function. Cloud Run is the answer for containerized applications or services with multiple endpoints.

Choosing the Right Serverless Product

Question	Answer
Need to run a container with multiple endpoints?	Cloud Run
Need a simple function triggered by an event?	Cloud Functions
Need a managed web app platform with no containers?	App Engine Standard
Need custom runtime or background processing?	App Engine Flexible or Cloud Run
Need to scale to zero?	Cloud Run or App Engine Standard
Need WebSocket support?	Cloud Run or App Engine Flexible

4. Containers in the Cloud

Containers vs. Virtual Machines

This comparison is heavily tested. Know it cold:

Aspect	Virtual Machines	Containers
Abstraction level	Full hardware virtualization with guest OS	OS-level virtualization sharing host kernel
Size	Gigabytes (includes full OS)	Megabytes (application + dependencies only)
Startup time	Minutes	Seconds
Resource overhead	High (each VM runs its own OS)	Low (shared kernel, no guest OS)
Isolation	Strong (separate OS per VM)	Process-level (shared kernel)
Portability	Limited (tied to hypervisor)	High (runs anywhere with container runtime)
Density	Fewer per host	Many more per host
Use case	Legacy apps, full OS control, strong isolation	Microservices, modern apps, rapid deployment

Exam trap: Containers are NOT always better than VMs. VMs provide stronger isolation (critical for multi-tenant security), support any OS (Windows, Linux), and are necessary for legacy applications that cannot be containerized. The exam tests whether you know when VMs are the right choice.

Microservices Architecture

Microservices decompose a monolithic application into small, independently deployable services, each responsible for a specific business function.

Benefits:

Independent scaling: Scale only the services that need it, not the entire application
Independent deployment: Update one service without redeploying everything
Technology flexibility: Each service can use a different language, framework, or database
Fault isolation: A failure in one service does not crash the entire application
Team autonomy: Small teams own individual services end-to-end

Challenges:

Network complexity and latency between services
Distributed system debugging is harder
Data consistency across services requires careful design
Operational overhead (monitoring, logging, tracing across many services)

Relationship to containers: Containers are the natural deployment unit for microservices. Each microservice is packaged as a container image, deployed independently, and scaled individually. Kubernetes orchestrates the lifecycle of these containers.

Google Kubernetes Engine (GKE)

GKE is Google's managed Kubernetes service. Google manages the control plane (API server, scheduler, etcd); you manage the workloads. (GKE Docs)

Two operating modes:

Feature	Autopilot (Recommended)	Standard
Node management	Google manages nodes	You manage node pools
Pricing	Pay per pod resource request	Pay per node (VM) regardless of utilization
Security hardening	Built-in, automatic	Manual configuration required
Configuration	Opinionated defaults	Full customization
Best for	Most workloads; production-ready with minimal ops	Workloads requiring specific node configurations

Key concepts for the exam:

Node pools: Groups of nodes (VMs) with identical configuration within a cluster
Cluster autoscaler: Automatically adjusts the number of nodes based on pod scheduling demands
Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on CPU, memory, or custom metrics
Spot Pods: Run workloads on Spot VMs within GKE for significant cost savings on fault-tolerant jobs

Exam trap: GKE Autopilot is NOT serverless in the traditional sense. You still work with Kubernetes concepts (pods, deployments, services). The "managed" part means Google handles node provisioning, scaling, and security. Cloud Run is the serverless container option where you do not interact with Kubernetes at all.

5. The Value of APIs

What APIs Are

An Application Programming Interface (API) is a standardized contract that defines how software components communicate. APIs expose specific capabilities of a service while hiding internal implementation details.

Business value of APIs:

Value	Description
New revenue streams	Monetize APIs by charging developers or partners for access
Ecosystem creation	Enable third-party developers to build on your platform
Partner integration	Standardized integration reduces custom development costs
Innovation acceleration	Internal teams and external partners build new products faster
Data monetization	Expose data services securely to paying customers

Apigee API Management

Apigee is Google Cloud's API management platform. It provides full lifecycle management for APIs -- design, secure, deploy, monitor, and monetize.

Core capabilities:

Capability	What It Does
API Gateway	Proxies API requests, enforces policies (rate limiting, quotas, authentication)
Developer Portal	Self-service portal where developers discover, register for, and test APIs
Analytics	Traffic analysis, error tracking, latency monitoring, developer engagement metrics
Monetization	Billing and revenue sharing for API usage (pay-per-call, tiered pricing, freemium models)
Security	OAuth, API keys, JWT validation, threat protection (SQL injection, XSS)
Version management	Manage multiple API versions and deprecation lifecycles

Exam trap: Apigee is the answer when the question mentions API monetization, developer portals, or API lifecycle management. Do not confuse it with Cloud Endpoints (simpler API gateway) or API Gateway (lightweight, serverless-focused).

6. Hybrid and Multi-Cloud

Hybrid Cloud vs. Multi-Cloud

Strategy	Definition	Business Drivers
Hybrid cloud	Combination of on-premises (or private cloud) and public cloud	Phased migration, data residency/compliance requirements, existing on-premises investments, latency-sensitive edge workloads
Multi-cloud	Using services from two or more public cloud providers	Avoid vendor lock-in, leverage best-of-breed services, redundancy across providers, regulatory requirements

Google Distributed Cloud (formerly Anthos)

Google Distributed Cloud is Google's platform for managing workloads consistently across on-premises data centers, edge locations, and multiple public clouds. It extends Google Cloud services and the GKE management model beyond Google's own infrastructure.

Key capabilities:

Capability	Description
Consistent management	Same tools, policies, and APIs across all environments
GKE everywhere	Run GKE clusters on-premises, on AWS, on Azure, or at the edge
Config Management	Policy-as-code and GitOps-based configuration management across all clusters
Service Mesh	Traffic management, observability, and security for microservices across environments
Serverless on-premises	Run Cloud Run workloads on your own infrastructure

When the exam says "Anthos": The exam may still reference "Anthos" by name. Anthos was rebranded to Google Distributed Cloud, but the functionality is the same. If a question asks about managing Kubernetes clusters across on-premises and multiple cloud providers from a single control plane, the answer is Anthos / Google Distributed Cloud.

Edge Computing

Edge computing processes data closer to where it is generated rather than sending everything to a centralized cloud data center. Google Distributed Cloud supports edge deployments for scenarios requiring:

Ultra-low latency (manufacturing, retail, telecommunications)
Data locality (data must stay in a specific physical location)
Intermittent connectivity (remote or disconnected sites)

Quick-Reference: Compute Decision Tree

Use this to answer "which service should you use" questions:

Does the workload require full OS-level control?
  YES --> Compute Engine (VMs)
  NO --> Is it a container-based workload?
    YES --> Do you need Kubernetes orchestration?
      YES --> GKE (Autopilot for most; Standard for custom needs)
      NO --> Cloud Run (serverless containers)
    NO --> Is it a simple event-driven function?
      YES --> Cloud Functions
      NO --> Is it a web/mobile app?
        YES --> App Engine (Standard for spiky traffic; Flexible for custom runtimes)
        NO --> Evaluate Compute Engine or Cloud Run based on requirements

Exam Tips for Domain 4

Migration strategy questions almost always describe a scenario and ask which R applies. Focus on the amount of change: no change = Rehost, some optimization = Replatform, architecture redesign = Refactor.
Compute choice questions test trade-offs between control and management burden. More control = more management = Compute Engine. Less management = less control = Cloud Run or Cloud Functions.
Container vs. VM questions test whether you understand that containers share a kernel (lighter, faster, less isolated) while VMs each have their own OS (heavier, slower, stronger isolation).
GKE Autopilot vs. Standard -- Autopilot is the recommended default. Standard is for edge cases requiring custom node configuration.
Serverless questions test whether you pick Cloud Run (containers, multiple endpoints), Cloud Functions (single event-driven function), or App Engine (managed web platform).
Anthos/Distributed Cloud is always the answer for hybrid or multi-cloud Kubernetes management.
Apigee is always the answer for API monetization or full API lifecycle management.
Spot VMs are always wrong for workloads that cannot tolerate interruption.

Domain 4: Modernize Infrastructure and Applications with Google Cloud (~17%)

1. Cloud Modernization and Migration

The 6 Rs of Migration

Four Phases of Migration

Google Cloud Migration Tools

2. Computing in the Cloud

Compute Engine (IaaS)

Autoscaling and Load Balancing

3. Serverless Computing

Cloud Run

App Engine (PaaS)

Cloud Functions (FaaS)

Choosing the Right Serverless Product

4. Containers in the Cloud

Containers vs. Virtual Machines

Microservices Architecture

Google Kubernetes Engine (GKE)

5. The Value of APIs

What APIs Are

Apigee API Management

6. Hybrid and Multi-Cloud

Hybrid Cloud vs. Multi-Cloud

Google Distributed Cloud (formerly Anthos)

Edge Computing

Quick-Reference: Compute Decision Tree

Exam Tips for Domain 4

References