Reference

Domain 2: Planning and Configuring a Cloud Solution (~17.5%)

Domain 2 covers the planning phase -- selecting the right compute, storage, data, and networking resources before you deploy anything. This domain accounts for roughly 17.5% of the exam, which translates to approximately 9-10 questions. Unlike Domain 3 (deploying), the questions here focus on choosing and sizing resources rather than running CLI commands. Expect scenario-based questions that describe a workload and ask you to select the best-fit service or configuration.


2.1 Planning and Estimating Using the Pricing Calculator

The Google Cloud Pricing Calculator lets you estimate monthly costs before provisioning resources. The exam tests your understanding of what inputs the calculator requires and how discount models affect pricing.

Key Concepts

Inputs the calculator expects:

Resource Key Inputs
Compute Engine Machine type, number of instances, hours/month, region, OS license, persistent disk size/type, sustained use or committed use discounts
GKE Cluster management fee, node pool machine types, node count, Autopilot vs. Standard
Cloud Storage Storage class, data volume (GB), retrieval volume, network egress
Cloud SQL Instance type, vCPUs, memory, storage (SSD/HDD), HA configuration, region
BigQuery On-demand vs. flat-rate (editions), data stored, data scanned per query
Networking Egress volume, load balancer type, forwarding rules, Premium vs. Standard tier

Discount models to know:

Discount Type How It Works Applies To
Sustained use discounts (SUDs) Automatic discounts for running instances > 25% of the month; up to 30% off for N1, up to 20% for N2/N2D/C2 N1, N2, N2D, C2, M1/M2, sole-tenant nodes (NOT E2, Tau, A2/A3)
Committed use discounts (CUDs) 1-year or 3-year commitment for a fixed amount of vCPUs/memory; up to 57% off (3-year) Most machine types; spend-based CUDs for some services
Spot VM pricing Up to 60-91% off standard pricing; can be preempted at any time Fault-tolerant batch workloads
Free tier Always-free monthly allowances (e.g., 1 f1-micro in us-central1) Select services/regions

Exam trap: Sustained use discounts are applied automatically -- you do not need to opt in. But they do NOT apply to E2, Tau (T2D/T2A), or accelerator-optimized (A2/A3) machine types. CUDs require an explicit commitment purchase.

Exam trap: Egress costs are a common oversight. Ingress to Google Cloud is free, but egress (data leaving Google Cloud or going between regions) is charged. The Pricing Calculator requires you to estimate egress volume.


2.2 Planning and Configuring Compute Resources

Compute Engine Machine Types

Compute Engine offers machine type families organized by workload profile. You must know the families, their series, and when to use each.

Machine Type Families:

Family Series vCPU Range Memory/vCPU Use Case
General-purpose E2, N2, N2D, N1, C3, C3D, N4, Tau T2D/T2A 1-360 0.5-8 GB Web servers, app servers, microservices, small-medium databases, dev/test
Compute-optimized C2, C2D, H3, H4D Up to 192 2-8 GB HPC, gaming, single-threaded apps, batch processing, media transcoding
Memory-optimized M1, M2, M3, X4 Up to 416 14-24+ GB SAP HANA, large in-memory databases, real-time analytics
Accelerator-optimized A2, A3, G2 Varies Varies ML training/inference, GPU rendering, video transcoding
Storage-optimized Z3 Up to 192 8 GB High-throughput local SSD workloads, in-memory databases with fast storage

Predefined machine type naming convention:

[series]-[type]-[vCPUs]
Example: n2-standard-4  →  N2 series, standard memory ratio, 4 vCPUs
         e2-highmem-8   →  E2 series, high memory ratio, 8 vCPUs
         c2-standard-60 →  C2 series, standard ratio, 60 vCPUs

Memory-to-vCPU ratios in type names:

Type Suffix Memory per vCPU Typical Use
highcpu ~0.9-2 GB CPU-bound workloads
standard ~3.75-4 GB Balanced workloads
highmem ~6.5-8 GB Memory-heavy apps, medium databases
megamem ~14 GB Large in-memory workloads
ultramem ~24+ GB SAP HANA, extreme memory needs

Shared-core machine types (E2: e2-micro, e2-small, e2-medium; legacy N1: f1-micro, g1-small) provide burstable fractional vCPUs at the lowest cost. The f1-micro in us-central1 is part of the always-free tier.

Custom Machine Types

For workloads that do not fit predefined ratios, custom machine types let you specify exact vCPU and memory combinations. Available on N1, N2, N2D, and E2 series.

  • vCPU count must be even (except 1 vCPU is allowed).
  • Memory must be a multiple of 256 MB.
  • Extended memory allows exceeding the default memory-per-vCPU limit (at a premium).
  • Custom types cost approximately 5% more than equivalent predefined types.
# Custom machine type naming:
n2-custom-8-32768   →  N2 series, 8 vCPUs, 32 GB (32768 MB) memory
e2-custom-4-8192    →  E2 series, 4 vCPUs, 8 GB memory

Exam trap: Custom machine types are only available in certain series. You cannot create custom C2, M2, or A2 types. If the exam describes a workload needing precise CPU/memory ratios, the answer is a custom N2 or E2.

Spot VMs vs. Preemptible VMs

Spot VMs replaced legacy preemptible VMs as the recommended interruptible instance option.

Feature Spot VMs Preemptible VMs (Legacy)
Maximum runtime No limit 24 hours
Preemption notice 30 seconds 30 seconds
Discount 60-91% off on-demand 60-91% off on-demand
Availability Subject to capacity Subject to capacity
Live migration No No
Auto-restart No No
gcloud flag --provisioning-model=SPOT --preemptible (deprecated)

When to use Spot VMs: Batch processing, CI/CD builds, data analytics, fault-tolerant distributed workloads, rendering pipelines. Never use them for stateful, user-facing, or always-on services.

Exam trap: The --preemptible flag still works but is deprecated. The exam uses --provisioning-model=SPOT. Spot VMs have no 24-hour maximum -- that was a preemptible-only limitation.

Sole-Tenant Nodes

Sole-tenant nodes provide dedicated physical servers for your VMs. No other customer's VMs share the hardware.

When to use sole-tenant nodes:

  • Licensing compliance: Bring-your-own-license (BYOL) software that requires per-core or per-socket licensing (e.g., Windows Server, Oracle Database).
  • Regulatory/compliance: Workloads that require physical isolation from other tenants.
  • Performance isolation: Workloads sensitive to noisy-neighbor effects.

Sole-tenant nodes use node templates (defining machine type family and node type) and node groups (collection of nodes). VMs are placed on sole-tenant nodes via node affinity labels.

Exam trap: Sole-tenant nodes provide physical isolation, not network isolation. Network isolation is handled by VPCs and firewalls. The exam may try to confuse the two.


Google Kubernetes Engine (GKE)

GKE offers two modes of operation:

Feature GKE Standard GKE Autopilot
Node management You manage nodes (machine types, scaling, upgrades) Google manages nodes entirely
Billing Pay for node VMs (whether utilized or not) Pay per pod resource requests (vCPU, memory, ephemeral storage)
Node pools You create and configure Managed automatically
Security posture You harden nodes Hardened by default (no SSH, no privileged pods by default)
GPU/TPU support Full support Supported (with constraints)
Cluster autoscaler You configure Built-in, always on
SLA 99.5% (zonal), 99.95% (regional) 99.9% (regional control plane)
Best for Full control, custom configurations, specialized hardware Reduced operational overhead, predictable costs

Cluster types:

Type Description Exam Context
Zonal cluster Single control plane in one zone; nodes in same zone Lowest cost; not HA
Multi-zonal cluster Single control plane in one zone; nodes across multiple zones Node HA but single control plane SPOF
Regional cluster Control plane replicas across 3 zones; nodes across zones Production recommended; highest availability
Private cluster Nodes have internal IPs only; optional private endpoint for control plane Security-focused environments

Node pools are groups of nodes within a cluster that share the same configuration (machine type, image, labels). Use multiple node pools to mix machine types (e.g., CPU pool + GPU pool) or run different OS images.

Exam trap: Autopilot clusters are always regional -- you cannot create a zonal Autopilot cluster. If the question says "minimize operational overhead" or "pay only for running pods," the answer is Autopilot.

Exam trap: In Standard mode, you pay for nodes even if pods are not utilizing them. In Autopilot, you pay for pod resource requests. For cost optimization on bursty workloads, Autopilot can be cheaper.


Cloud Run

Cloud Run is a fully managed serverless platform that runs stateless containers. It abstracts away all infrastructure.

Key configuration parameters:

Parameter Description Default
Container image Must listen on the port defined by $PORT env variable Required
CPU 1, 2, 4, or 8 vCPUs 1
Memory 128 MiB to 32 GiB 512 MiB
Concurrency Max simultaneous requests per container instance (1-1000) 80
Min instances Minimum warm instances (avoid cold starts) 0
Max instances Upper limit on auto-scaling 100
Timeout Max request duration (up to 60 minutes) 5 minutes
CPU allocation Always allocated vs. only during request processing Request-based

Cloud Run vs. Cloud Functions decision matrix:

Criteria Cloud Run Cloud Functions
Unit of deployment Container image Source code (function)
Runtime Any language/framework in a container Supported runtimes only (Node.js, Python, Go, Java, .NET, Ruby, PHP)
Concurrency Multiple requests per instance 1 request per instance (1st gen) / configurable (2nd gen)
Execution time Up to 60 minutes Up to 60 minutes (2nd gen) / 9 minutes (1st gen)
Best for Microservices, APIs, web apps, custom runtimes Event-driven functions, lightweight transformations, glue code

Exam trap: If a question mentions "any programming language" or "custom binary/runtime," the answer is Cloud Run, not Cloud Functions. Cloud Functions requires a supported runtime.


Cloud Functions

Cloud Functions is Google Cloud's functions-as-a-service (FaaS) offering. The exam tests the differences between 1st gen and 2nd gen.

Feature 1st Gen 2nd Gen
Runtime engine Custom Google infrastructure Built on Cloud Run
Concurrency 1 request per instance Up to 1,000 concurrent requests per instance
Max timeout 9 minutes (HTTP), 9 minutes (event) 60 minutes (HTTP), 9 minutes (event)
Min instances Not supported Supported (reduce cold starts)
Traffic splitting Not supported Supported (canary deployments)
Max instance size 8 GB memory, 2 vCPUs 16 GiB memory, 4 vCPUs
Event triggers HTTP, Pub/Sub, Cloud Storage, Firestore, etc. HTTP, Eventarc (CloudEvents format)
VPC connector Supported Supported

Common trigger types for the exam:

  • HTTP trigger: Invoked via HTTPS URL.
  • Cloud Storage trigger: Fires on object finalize, delete, archive, metadata update.
  • Pub/Sub trigger: Fires when a message is published to a topic.
  • Firestore trigger: Fires on document create, update, delete, write.
  • Eventarc trigger (2nd gen): Unifies event delivery from 90+ Google Cloud sources using CloudEvents standard.

Exam trap: 2nd gen Cloud Functions are built on Cloud Run infrastructure. If a question describes needing concurrency > 1 per instance, longer timeouts, or traffic splitting, the answer is 2nd gen (or Cloud Run directly).


2.3 Planning and Configuring Data Storage Options

Choosing the Right Database Service

Service Type Data Model Scaling Transactions Best For
Cloud SQL Managed relational MySQL, PostgreSQL, SQL Server Vertical (read replicas for horizontal reads) Full ACID Traditional OLTP apps, CMS, ERP, < 64 TB
Cloud Spanner Managed relational Relational with horizontal scaling Horizontal (automatic sharding) Full ACID + global strong consistency Global OLTP at scale, financial systems, gaming leaderboards
BigQuery Serverless data warehouse Columnar (SQL interface) Fully managed, serverless N/A (analytics, not OLTP) OLAP, analytics, ad-hoc queries, data warehousing, petabyte-scale
Firestore Managed NoSQL document Document (collections/documents) Automatic ACID (within document groups) Mobile/web apps, real-time sync, user profiles, game state
Cloud Bigtable Managed NoSQL wide-column Key-value / wide-column Horizontal (add nodes) Single-row only IoT time-series, ad-tech, financial tick data, > 1 TB, high throughput
AlloyDB Managed PostgreSQL-compatible Relational (PostgreSQL) Vertical + read pools Full ACID PostgreSQL workloads needing 4x faster analytics, AI/ML integration, hybrid transactional/analytical

Decision flowchart for the exam:

  1. Need SQL and strong consistency at global scale? --> Cloud Spanner
  2. Need SQL for a traditional relational workload < 64 TB? --> Cloud SQL
  3. Need analytics / data warehouse / petabyte-scale queries? --> BigQuery
  4. Need NoSQL document store with mobile/web SDK and real-time sync? --> Firestore
  5. Need NoSQL for massive throughput on flat key-value data (> 1 TB)? --> Cloud Bigtable

Exam trap: Cloud SQL scales vertically (bigger instance) and uses read replicas for read scaling. It does NOT scale horizontally for writes. If a question says "horizontally scalable relational database," the answer is Cloud Spanner, not Cloud SQL.

Exam trap: BigQuery is for analytics (OLAP), not transactional workloads (OLTP). If the scenario describes reporting, dashboards, or ad-hoc queries on large datasets, it is BigQuery. If it describes user-facing transactions, it is not.

Exam trap: Bigtable requires a minimum of 1 node (unlike Firestore, which is serverless with no provisioning). Bigtable is not cost-effective for datasets under ~1 TB.

Cloud Storage Classes

Cloud Storage offers four storage classes, all with the same API, latency, and throughput. They differ in cost, minimum storage duration, and retrieval fees.

Storage Class Min Storage Duration At-Rest Cost (relative) Retrieval Cost Use Case
Standard None Highest None Frequently accessed data, hot data, short-lived objects
Nearline 30 days ~50% of Standard Per-GB retrieval fee Data accessed less than once per month; backups
Coldline 90 days ~25% of Standard Higher per-GB retrieval fee Data accessed less than once per quarter; disaster recovery
Archive 365 days Lowest (~10% of Standard) Highest per-GB retrieval fee Data accessed less than once per year; regulatory archives

Critical storage concepts for the exam:

  • Minimum storage duration charge: If you delete or modify an object before the minimum storage duration, you are still charged for the full minimum period. Deleting a Coldline object after 10 days incurs 90 days of storage charges.
  • Bucket-level class vs. object-level class: A bucket has a default storage class, but individual objects can have a different class.
  • Autoclass: Automatically transitions objects between classes based on access patterns. Eliminates the need to manually set lifecycle rules for cost optimization.
  • Lifecycle management: Rules that automatically transition storage class or delete objects based on age, creation date, number of versions, or storage class.
  • Object versioning: Retains previous versions of objects when overwritten or deleted. Used for data protection and compliance. Each version is charged independently.
  • Bucket locations: Regional, dual-region, or multi-region. Affects availability, latency, and cost.

Exam trap: All storage classes have the same low-latency access (milliseconds). Archive storage is NOT slow to retrieve -- it just has the highest per-GB retrieval cost. Do not confuse Cloud Storage Archive with tape-like cold storage.

Exam trap: Minimum storage duration charges apply even if you change the object's storage class. Moving an object from Coldline to Standard after 30 days incurs the remaining 60 days of Coldline charges.

Persistent Disks

Persistent disks are durable block storage for Compute Engine and GKE.

Disk Type IOPS (read/write) Throughput Use Case
pd-standard (Standard) Low Low Boot disks, bulk data, cost-sensitive
pd-balanced Moderate Moderate General workloads (best price-performance)
pd-ssd (SSD) High High Databases, latency-sensitive apps
pd-extreme Highest (configurable) Highest Enterprise databases (SAP HANA, Oracle)
Hyperdisk Balanced High (configurable) High Databases requiring dynamic IOPS provisioning

Zonal vs. Regional persistent disks:

Feature Zonal PD Regional PD
Availability Single zone Synchronous replication across 2 zones in same region
Use case Standard workloads HA for stateful workloads, failover scenarios
Performance Full performance Slight write latency increase due to replication
Cost Standard pricing ~2x zonal pricing (two copies)

Exam trap: Regional persistent disks replicate across exactly 2 zones (not 3). They are designed for HA, not for performance improvement. Write latency is higher because writes must commit to both zones.

Exam trap: Local SSDs provide the highest IOPS but are ephemeral -- data is lost when the VM stops or is preempted. They are not persistent disks. Do not choose local SSDs for durable storage.


2.4 Planning and Configuring Network Resources

Load Balancing

Google Cloud offers multiple load balancer types. The exam frequently tests which load balancer to use for a given scenario.

Load Balancer Decision Matrix:

Load Balancer Scope Traffic Type Proxy/Passthrough Backend Types
Global external Application LB Global HTTP(S) Proxy (Layer 7) MIGs, NEGs, Cloud Run, Cloud Functions, GCS buckets
Regional external Application LB Regional HTTP(S) Proxy (Layer 7) MIGs, NEGs
Global external proxy Network LB Global TCP/SSL Proxy (Layer 4) MIGs, NEGs
Regional external passthrough Network LB Regional TCP/UDP Passthrough (Layer 4) MIGs, NEGs
Internal Application LB Regional HTTP(S) Proxy (Layer 7) MIGs, NEGs
Internal passthrough Network LB Regional TCP/UDP Passthrough (Layer 4) MIGs, NEGs
Cross-region internal Application LB Global HTTP(S) Proxy (Layer 7) MIGs, NEGs
Internal proxy Network LB Regional TCP Proxy (Layer 4) MIGs, NEGs

Quick selection guide:

  1. External HTTP(S) traffic, global reach, CDN? --> Global external Application LB
  2. External TCP/SSL, global reach? --> Global external proxy Network LB
  3. External UDP or non-HTTP TCP, single region? --> Regional external passthrough Network LB
  4. Internal HTTP(S) microservices? --> Internal Application LB
  5. Internal TCP/UDP (e.g., database traffic)? --> Internal passthrough Network LB

Key terminology:

  • Proxy load balancer: Terminates the client connection and opens a new connection to the backend. Can inspect and modify headers (Layer 7) or just forward (Layer 4).
  • Passthrough load balancer: Forwards packets directly to backends without terminating the connection. Backend sees the original client IP.
  • Backend service: Defines how the load balancer distributes traffic (health checks, session affinity, timeout).
  • URL map: Routes requests to different backend services based on host and path rules (Application LBs only).
  • Forwarding rule: Maps an external IP + port to the load balancer.

Exam trap: The global external Application Load Balancer can serve backends in multiple regions and route traffic to the nearest healthy backend. It is the only load balancer type that integrates with Cloud CDN. If the question mentions "CDN" or "content caching," this is the answer.

Exam trap: The internal passthrough Network LB preserves the original client IP because it is a passthrough (Layer 4) load balancer. The internal Application LB is a proxy and does NOT preserve the original client IP by default (it is in X-Forwarded-For).

Resource Locations: Regions, Zones, and Multi-Region

Understanding Google Cloud's geography is essential for planning.

Concept Definition Example
Region Independent geographic area with 3+ zones us-central1, europe-west1, asia-east1
Zone Isolated deployment area within a region us-central1-a, us-central1-b
Multi-region Large geographic area containing 2+ regions US, EU, ASIA

Planning guidelines:

  • Latency: Place resources close to users. Use gcloud compute regions list to see available regions.
  • High availability: Distribute across zones (within a region) for zone-level failure tolerance. Distribute across regions for regional failure tolerance.
  • Data residency: Some regulations require data to stay within specific geographic boundaries. Choose regions accordingly.
  • Service availability: Not all services or machine types are available in every region. Verify before committing.
  • Cost: Pricing varies by region. us-central1 and us-east1 are typically the cheapest US regions. Inter-region traffic incurs egress charges.

Exam trap: A zone failure takes down all resources in that zone. A regional managed instance group (or regional GKE cluster) survives zone failures because instances are spread across zones. A zonal MIG or zonal GKE cluster does not survive zone failures.

Network Service Tiers: Premium vs. Standard

Google Cloud offers two Network Service Tiers that affect how traffic is routed between users and Google Cloud resources.

Feature Premium Tier Standard Tier
Routing Traffic enters/exits Google's network at the edge closest to the user (cold-potato routing) Traffic enters/exits at the edge closest to the Google Cloud region (hot-potato routing)
Performance Lower latency, higher throughput, more reliable Higher latency, less consistent
Global load balancing Supported (single anycast IP) Not supported (regional IPs only)
SLA Google's global backbone SLA No network performance SLA
Cost Higher Lower
Default Yes (project default) Must explicitly select

When to use each:

  • Premium Tier: Production workloads, global users, latency-sensitive applications, any workload requiring global load balancing.
  • Standard Tier: Cost-sensitive workloads with regional users, batch processing, workloads tolerant of higher latency, dev/test.

Exam trap: Global external Application Load Balancer requires Premium Tier. If you switch to Standard Tier, you can only use regional load balancers. Any question mentioning "global load balancer" implies Premium Tier.

Exam trap: Premium Tier is the default tier. You do not need to do anything to use it. Standard Tier must be explicitly configured per resource.


Exam Strategy for Domain 2

  1. Know the decision trees: Most Domain 2 questions are "which service should you use?" Pick the service based on the workload description, not just the technology name.
  2. Cost optimization signals: If the question mentions cost savings, look for Spot VMs, Autopilot GKE, committed use discounts, appropriate storage classes, and Standard network tier.
  3. Scale signals: "Horizontally scalable relational" = Spanner. "Petabyte analytics" = BigQuery. "Global HTTP" = Global Application LB. "Millions of writes per second" = Bigtable.
  4. Duration signals: Minimum storage durations (30/90/365 days) are heavily tested for Cloud Storage classes.
  5. Managed vs. control signals: "Minimize operational overhead" = Autopilot, Cloud Run, serverless. "Full control over nodes" = GKE Standard. "Custom runtime" = Cloud Run.

References