Reference

Domain 4: Modernize Infrastructure and Applications with Google Cloud (~17%)

Domain 4 of the Google Cloud Digital Leader exam covers how organizations modernize infrastructure and applications using Google Cloud services. At approximately 17% of the exam, this domain accounts for roughly 9-10 questions. It spans six topic areas: migration strategies, compute options, serverless computing, containers, APIs, and hybrid/multi-cloud.

The exam tests your ability to select the right compute model for a given workload, explain the business rationale behind migration and modernization strategies, and understand when to use containers versus VMs versus serverless. This is not a deep-dive engineering domain -- it tests conceptual understanding and decision-making.

1. Cloud Modernization and Migration

The 6 Rs of Migration

Every migration question on the exam maps to one of these strategies. Memorize them and their trade-offs:

Strategy Also Known As What Happens When to Use Effort Level
Retire Decommission Shut down the application entirely Application is no longer needed or used None
Retain Keep on-premises Do not migrate; keep running where it is Compliance requirements, recent hardware investment, not worth migrating None
Rehost Lift and shift Move to cloud VMs with minimal or no code changes Legacy applications, tightly-coupled systems, need fastest path to cloud Low
Replatform Lift and optimize / Move and improve Migrate with some optimization (e.g., swap to managed database) Applications that benefit from cloud services without full rewrite Medium
Refactor Move and improve Modify application architecture to leverage cloud-native features Applications worth investing in for long-term cloud benefits High
Repurchase Drop and shop / Replace Switch to a commercial SaaS product (e.g., replace custom CRM with Salesforce, move email to Google Workspace) Applications where a SaaS alternative exists and is more cost-effective than maintaining custom code Medium

(Google Cloud Migration Guide)

Exam trap: The exam uses the terms interchangeably. "Lift and shift" is always Rehost. "Move and improve" is Replatform. If a question describes moving a VM image directly to Compute Engine with no code changes, that is Rehost -- even if they do not use the word.

Exam trap: Replatform is NOT Refactor. Replatform means making targeted improvements during migration (e.g., switching from a self-managed MySQL to Cloud SQL). Refactor means redesigning the application architecture itself (e.g., breaking a monolith into microservices).

Four Phases of Migration

Google Cloud defines a structured migration framework:

Phase Purpose Key Activities
Assess Understand what you have Inventory applications, identify dependencies, calculate total cost of ownership (TCO)
Plan Design the target environment Select migration strategies per workload, design cloud infrastructure, establish networking
Deploy Execute the migration Migrate workloads, validate functionality, refine processes
Optimize Maximize cloud value Tune performance, reduce costs, adopt cloud-native features

(Google Cloud Migration Guide)

Google Cloud Migration Tools

Tool Purpose
Migration Center Unified platform for end-to-end migration planning and assessment
Migrate to Virtual Machines Migrate physical servers and VMs to Compute Engine
Database Migration Service Migrate databases to Cloud SQL, AlloyDB, or other managed services
Storage Transfer Service Move data from other cloud providers or on-premises storage
Transfer Appliance Physical hardware appliance for transferring hundreds of terabytes to 1 petabyte of data
BigQuery Migration Service Migrate data warehouse workloads to BigQuery

(Google Cloud Migration Guide)

2. Computing in the Cloud

Compute Engine (IaaS)

Compute Engine provides virtual machines running on Google's infrastructure. It is the Infrastructure-as-a-Service (IaaS) offering -- you get full control over the OS, networking, and installed software. (Compute Engine Docs)

Machine type families (know what each is optimized for):

Family Optimized For Example Use Cases
General-purpose (E2, N2, N2D, N1, C3) Balanced CPU and memory Web servers, application servers, small databases, development
Compute-optimized (C2, C3, H3) High per-core CPU performance Batch processing, gaming, high-performance computing (HPC)
Memory-optimized (M2, M3) High memory-to-CPU ratio In-memory databases (SAP HANA, Redis), real-time analytics
Accelerator-optimized (A2, A3, G2) GPU/TPU workloads Machine learning training/inference, video transcoding, scientific simulation

Cost optimization options:

Option Discount Key Constraint
Sustained use discounts Up to 30% automatically No commitment -- applied automatically when a VM runs >25% of a month
Committed use discounts (CUDs) Up to 55% (general); up to 70% (memory-optimized) Requires 1-year or 3-year commitment for specific vCPU and memory amounts
Spot VMs (formerly Preemptible VMs) Up to 60-91% Google can reclaim them at any time with 30 seconds notice; no SLA

Exam trap: Spot VMs are NOT suitable for workloads that cannot tolerate interruption. They are ideal for batch processing, CI/CD, fault-tolerant jobs, and data analysis. If a question describes a critical production database, Spot VMs are the wrong answer.

Sole-tenant nodes: Dedicated physical servers where only your VMs run. Used for compliance requirements, licensing constraints (bring-your-own-license), or workloads that require physical isolation from other tenants. More expensive than standard VMs.

Live migration: Google transparently moves running VMs to different physical hosts during maintenance events -- no reboot, no downtime. This is a key differentiator versus other cloud providers. Spot VMs do not support live migration; they are terminated instead.

Autoscaling and Load Balancing

Managed instance groups (MIGs) contain identical VM instances and support:

  • Autoscaling: Automatically adds or removes VM instances based on CPU utilization, load balancing capacity, custom metrics, or schedules
  • Autohealing: Replaces unhealthy instances based on health check results
  • Rolling updates: Deploy new versions across the group with configurable surge and disruption limits

Cloud Load Balancing distributes traffic across instances, regions, or backends. Key types:

Type Layer Scope Use Case
HTTP(S) Load Balancing Layer 7 Global Web applications, content-based routing
TCP/SSL Proxy Layer 4 Global Non-HTTP TCP traffic requiring global distribution
Network Load Balancing Layer 4 Regional High-performance, low-latency regional traffic
Internal Load Balancing Layer 4/7 Regional Traffic between internal services (not internet-facing)

Exam trap: HTTP(S) Load Balancing is global -- a single anycast IP routes users to the nearest healthy backend. Network Load Balancing is regional. If a question requires distributing web traffic across multiple regions, the answer is HTTP(S) Load Balancing.

3. Serverless Computing

Serverless means Google manages all infrastructure. You deploy code or containers; Google handles provisioning, scaling, patching, and availability. The exam tests three serverless products and when to choose each.

Cloud Run

Cloud Run is a fully managed platform for deploying containerized applications, functions, or source code. It automatically scales (including to zero), charges only for resources consumed during request processing, and requires no cluster management. (Cloud Run Docs)

Key characteristics:

  • Accepts any language or binary packaged as a container image
  • Supports source-based deployment for Go, Node.js, Python, Java, .NET, Ruby (auto-builds container)
  • Provides HTTPS endpoints with automatic TLS certificates
  • Supports WebSockets, HTTP/2, and gRPC end-to-end
  • Scales to zero when idle (no cost); scales up automatically under load
  • Two billing models: request-based (pay per request) and instance-based (pay per instance lifetime)

Three resource types:

Type Purpose Scaling
Services Handle HTTP requests at unique HTTPS endpoints Auto-scales (including to zero)
Jobs Run tasks to completion (batch processing) Parallelizable across instances
Worker Pools Pull-based workloads (Kafka, Pub/Sub consumers) Manual scaling; no public endpoint

App Engine (PaaS)

App Engine is Google's original Platform-as-a-Service for web and mobile backends. It comes in two environments:

Feature Standard Environment Flexible Environment
Startup time Seconds Minutes
Scale to zero Yes No (minimum 1 instance)
Custom runtimes No Yes (via Dockerfile)
WebSocket support No Yes
Background processes No Yes
SSH debugging No Yes
Pricing basis Instance hours vCPU, memory, disk
Best for Spiky traffic, low-cost apps Steady traffic, custom dependencies

(App Engine Environments)

Important: Google officially recommends Cloud Run over App Engine for new projects. App Engine questions on the exam typically test knowledge of its two environments and their trade-offs, not as the preferred choice for new workloads.

Cloud Functions (FaaS)

Cloud Functions (now called Cloud Run functions) is Google's Function-as-a-Service offering for small, event-driven code. You write a single function; Google executes it in response to events.

Key characteristics:

  • Single-purpose functions triggered by events (HTTP requests, Pub/Sub messages, Cloud Storage changes, Firestore updates)
  • Automatic scaling per invocation
  • Pay only for execution time (billed per 100ms)
  • Supported runtimes: Node.js, Python, Go, Java, .NET, Ruby, PHP
  • 2nd gen (current) is built on Cloud Run infrastructure

Exam trap: Cloud Functions 2nd gen is built on Cloud Run under the hood. Google is converging these products. For the exam, Cloud Functions is the answer when the question describes a simple, single-purpose, event-triggered function. Cloud Run is the answer for containerized applications or services with multiple endpoints.

Choosing the Right Serverless Product

Question Answer
Need to run a container with multiple endpoints? Cloud Run
Need a simple function triggered by an event? Cloud Functions
Need a managed web app platform with no containers? App Engine Standard
Need custom runtime or background processing? App Engine Flexible or Cloud Run
Need to scale to zero? Cloud Run or App Engine Standard
Need WebSocket support? Cloud Run or App Engine Flexible

4. Containers in the Cloud

Containers vs. Virtual Machines

This comparison is heavily tested. Know it cold:

Aspect Virtual Machines Containers
Abstraction level Full hardware virtualization with guest OS OS-level virtualization sharing host kernel
Size Gigabytes (includes full OS) Megabytes (application + dependencies only)
Startup time Minutes Seconds
Resource overhead High (each VM runs its own OS) Low (shared kernel, no guest OS)
Isolation Strong (separate OS per VM) Process-level (shared kernel)
Portability Limited (tied to hypervisor) High (runs anywhere with container runtime)
Density Fewer per host Many more per host
Use case Legacy apps, full OS control, strong isolation Microservices, modern apps, rapid deployment

Exam trap: Containers are NOT always better than VMs. VMs provide stronger isolation (critical for multi-tenant security), support any OS (Windows, Linux), and are necessary for legacy applications that cannot be containerized. The exam tests whether you know when VMs are the right choice.

Microservices Architecture

Microservices decompose a monolithic application into small, independently deployable services, each responsible for a specific business function.

Benefits:

  • Independent scaling: Scale only the services that need it, not the entire application
  • Independent deployment: Update one service without redeploying everything
  • Technology flexibility: Each service can use a different language, framework, or database
  • Fault isolation: A failure in one service does not crash the entire application
  • Team autonomy: Small teams own individual services end-to-end

Challenges:

  • Network complexity and latency between services
  • Distributed system debugging is harder
  • Data consistency across services requires careful design
  • Operational overhead (monitoring, logging, tracing across many services)

Relationship to containers: Containers are the natural deployment unit for microservices. Each microservice is packaged as a container image, deployed independently, and scaled individually. Kubernetes orchestrates the lifecycle of these containers.

Google Kubernetes Engine (GKE)

GKE is Google's managed Kubernetes service. Google manages the control plane (API server, scheduler, etcd); you manage the workloads. (GKE Docs)

Two operating modes:

Feature Autopilot (Recommended) Standard
Node management Google manages nodes You manage node pools
Pricing Pay per pod resource request Pay per node (VM) regardless of utilization
Security hardening Built-in, automatic Manual configuration required
Configuration Opinionated defaults Full customization
Best for Most workloads; production-ready with minimal ops Workloads requiring specific node configurations

Key concepts for the exam:

  • Node pools: Groups of nodes (VMs) with identical configuration within a cluster
  • Cluster autoscaler: Automatically adjusts the number of nodes based on pod scheduling demands
  • Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on CPU, memory, or custom metrics
  • Spot Pods: Run workloads on Spot VMs within GKE for significant cost savings on fault-tolerant jobs

Exam trap: GKE Autopilot is NOT serverless in the traditional sense. You still work with Kubernetes concepts (pods, deployments, services). The "managed" part means Google handles node provisioning, scaling, and security. Cloud Run is the serverless container option where you do not interact with Kubernetes at all.

5. The Value of APIs

What APIs Are

An Application Programming Interface (API) is a standardized contract that defines how software components communicate. APIs expose specific capabilities of a service while hiding internal implementation details.

Business value of APIs:

Value Description
New revenue streams Monetize APIs by charging developers or partners for access
Ecosystem creation Enable third-party developers to build on your platform
Partner integration Standardized integration reduces custom development costs
Innovation acceleration Internal teams and external partners build new products faster
Data monetization Expose data services securely to paying customers

Apigee API Management

Apigee is Google Cloud's API management platform. It provides full lifecycle management for APIs -- design, secure, deploy, monitor, and monetize.

Core capabilities:

Capability What It Does
API Gateway Proxies API requests, enforces policies (rate limiting, quotas, authentication)
Developer Portal Self-service portal where developers discover, register for, and test APIs
Analytics Traffic analysis, error tracking, latency monitoring, developer engagement metrics
Monetization Billing and revenue sharing for API usage (pay-per-call, tiered pricing, freemium models)
Security OAuth, API keys, JWT validation, threat protection (SQL injection, XSS)
Version management Manage multiple API versions and deprecation lifecycles

Exam trap: Apigee is the answer when the question mentions API monetization, developer portals, or API lifecycle management. Do not confuse it with Cloud Endpoints (simpler API gateway) or API Gateway (lightweight, serverless-focused).

6. Hybrid and Multi-Cloud

Hybrid Cloud vs. Multi-Cloud

Strategy Definition Business Drivers
Hybrid cloud Combination of on-premises (or private cloud) and public cloud Phased migration, data residency/compliance requirements, existing on-premises investments, latency-sensitive edge workloads
Multi-cloud Using services from two or more public cloud providers Avoid vendor lock-in, leverage best-of-breed services, redundancy across providers, regulatory requirements

Google Distributed Cloud (formerly Anthos)

Google Distributed Cloud is Google's platform for managing workloads consistently across on-premises data centers, edge locations, and multiple public clouds. It extends Google Cloud services and the GKE management model beyond Google's own infrastructure.

Key capabilities:

Capability Description
Consistent management Same tools, policies, and APIs across all environments
GKE everywhere Run GKE clusters on-premises, on AWS, on Azure, or at the edge
Config Management Policy-as-code and GitOps-based configuration management across all clusters
Service Mesh Traffic management, observability, and security for microservices across environments
Serverless on-premises Run Cloud Run workloads on your own infrastructure

When the exam says "Anthos": The exam may still reference "Anthos" by name. Anthos was rebranded to Google Distributed Cloud, but the functionality is the same. If a question asks about managing Kubernetes clusters across on-premises and multiple cloud providers from a single control plane, the answer is Anthos / Google Distributed Cloud.

Edge Computing

Edge computing processes data closer to where it is generated rather than sending everything to a centralized cloud data center. Google Distributed Cloud supports edge deployments for scenarios requiring:

  • Ultra-low latency (manufacturing, retail, telecommunications)
  • Data locality (data must stay in a specific physical location)
  • Intermittent connectivity (remote or disconnected sites)

Quick-Reference: Compute Decision Tree

Use this to answer "which service should you use" questions:

Does the workload require full OS-level control?
  YES --> Compute Engine (VMs)
  NO --> Is it a container-based workload?
    YES --> Do you need Kubernetes orchestration?
      YES --> GKE (Autopilot for most; Standard for custom needs)
      NO --> Cloud Run (serverless containers)
    NO --> Is it a simple event-driven function?
      YES --> Cloud Functions
      NO --> Is it a web/mobile app?
        YES --> App Engine (Standard for spiky traffic; Flexible for custom runtimes)
        NO --> Evaluate Compute Engine or Cloud Run based on requirements

Exam Tips for Domain 4

  1. Migration strategy questions almost always describe a scenario and ask which R applies. Focus on the amount of change: no change = Rehost, some optimization = Replatform, architecture redesign = Refactor.
  2. Compute choice questions test trade-offs between control and management burden. More control = more management = Compute Engine. Less management = less control = Cloud Run or Cloud Functions.
  3. Container vs. VM questions test whether you understand that containers share a kernel (lighter, faster, less isolated) while VMs each have their own OS (heavier, slower, stronger isolation).
  4. GKE Autopilot vs. Standard -- Autopilot is the recommended default. Standard is for edge cases requiring custom node configuration.
  5. Serverless questions test whether you pick Cloud Run (containers, multiple endpoints), Cloud Functions (single event-driven function), or App Engine (managed web platform).
  6. Anthos/Distributed Cloud is always the answer for hybrid or multi-cloud Kubernetes management.
  7. Apigee is always the answer for API monetization or full API lifecycle management.
  8. Spot VMs are always wrong for workloads that cannot tolerate interruption.

References