Domain 2: Planning and Configuring a Cloud Solution (~17.5%)
Domain 2 covers the planning phase -- selecting the right compute, storage, data, and networking resources before you deploy anything. This domain accounts for roughly 17.5% of the exam, which translates to approximately 9-10 questions. Unlike Domain 3 (deploying), the questions here focus on choosing and sizing resources rather than running CLI commands. Expect scenario-based questions that describe a workload and ask you to select the best-fit service or configuration.
2.1 Planning and Estimating Using the Pricing Calculator
The Google Cloud Pricing Calculator lets you estimate monthly costs before provisioning resources. The exam tests your understanding of what inputs the calculator requires and how discount models affect pricing.
Key Concepts
Inputs the calculator expects:
| Resource | Key Inputs |
|---|---|
| Compute Engine | Machine type, number of instances, hours/month, region, OS license, persistent disk size/type, sustained use or committed use discounts |
| GKE | Cluster management fee, node pool machine types, node count, Autopilot vs. Standard |
| Cloud Storage | Storage class, data volume (GB), retrieval volume, network egress |
| Cloud SQL | Instance type, vCPUs, memory, storage (SSD/HDD), HA configuration, region |
| BigQuery | On-demand vs. flat-rate (editions), data stored, data scanned per query |
| Networking | Egress volume, load balancer type, forwarding rules, Premium vs. Standard tier |
Discount models to know:
| Discount Type | How It Works | Applies To |
|---|---|---|
| Sustained use discounts (SUDs) | Automatic discounts for running instances > 25% of the month; up to 30% off for N1, up to 20% for N2/N2D/C2 | N1, N2, N2D, C2, M1/M2, sole-tenant nodes (NOT E2, Tau, A2/A3) |
| Committed use discounts (CUDs) | 1-year or 3-year commitment for a fixed amount of vCPUs/memory; up to 57% off (3-year) | Most machine types; spend-based CUDs for some services |
| Spot VM pricing | Up to 60-91% off standard pricing; can be preempted at any time | Fault-tolerant batch workloads |
| Free tier | Always-free monthly allowances (e.g., 1 f1-micro in us-central1) | Select services/regions |
Exam trap: Sustained use discounts are applied automatically -- you do not need to opt in. But they do NOT apply to E2, Tau (T2D/T2A), or accelerator-optimized (A2/A3) machine types. CUDs require an explicit commitment purchase.
Exam trap: Egress costs are a common oversight. Ingress to Google Cloud is free, but egress (data leaving Google Cloud or going between regions) is charged. The Pricing Calculator requires you to estimate egress volume.
2.2 Planning and Configuring Compute Resources
Compute Engine Machine Types
Compute Engine offers machine type families organized by workload profile. You must know the families, their series, and when to use each.
Machine Type Families:
| Family | Series | vCPU Range | Memory/vCPU | Use Case |
|---|---|---|---|---|
| General-purpose | E2, N2, N2D, N1, C3, C3D, N4, Tau T2D/T2A | 1-360 | 0.5-8 GB | Web servers, app servers, microservices, small-medium databases, dev/test |
| Compute-optimized | C2, C2D, H3, H4D | Up to 192 | 2-8 GB | HPC, gaming, single-threaded apps, batch processing, media transcoding |
| Memory-optimized | M1, M2, M3, X4 | Up to 416 | 14-24+ GB | SAP HANA, large in-memory databases, real-time analytics |
| Accelerator-optimized | A2, A3, G2 | Varies | Varies | ML training/inference, GPU rendering, video transcoding |
| Storage-optimized | Z3 | Up to 192 | 8 GB | High-throughput local SSD workloads, in-memory databases with fast storage |
Predefined machine type naming convention:
[series]-[type]-[vCPUs]
Example: n2-standard-4 → N2 series, standard memory ratio, 4 vCPUs
e2-highmem-8 → E2 series, high memory ratio, 8 vCPUs
c2-standard-60 → C2 series, standard ratio, 60 vCPUs
Memory-to-vCPU ratios in type names:
| Type Suffix | Memory per vCPU | Typical Use |
|---|---|---|
highcpu |
~0.9-2 GB | CPU-bound workloads |
standard |
~3.75-4 GB | Balanced workloads |
highmem |
~6.5-8 GB | Memory-heavy apps, medium databases |
megamem |
~14 GB | Large in-memory workloads |
ultramem |
~24+ GB | SAP HANA, extreme memory needs |
Shared-core machine types (E2: e2-micro, e2-small, e2-medium; legacy N1: f1-micro, g1-small) provide burstable fractional vCPUs at the lowest cost. The f1-micro in us-central1 is part of the always-free tier.
Custom Machine Types
For workloads that do not fit predefined ratios, custom machine types let you specify exact vCPU and memory combinations. Available on N1, N2, N2D, and E2 series.
- vCPU count must be even (except 1 vCPU is allowed).
- Memory must be a multiple of 256 MB.
- Extended memory allows exceeding the default memory-per-vCPU limit (at a premium).
- Custom types cost approximately 5% more than equivalent predefined types.
# Custom machine type naming:
n2-custom-8-32768 → N2 series, 8 vCPUs, 32 GB (32768 MB) memory
e2-custom-4-8192 → E2 series, 4 vCPUs, 8 GB memory
Exam trap: Custom machine types are only available in certain series. You cannot create custom C2, M2, or A2 types. If the exam describes a workload needing precise CPU/memory ratios, the answer is a custom N2 or E2.
Spot VMs vs. Preemptible VMs
Spot VMs replaced legacy preemptible VMs as the recommended interruptible instance option.
| Feature | Spot VMs | Preemptible VMs (Legacy) |
|---|---|---|
| Maximum runtime | No limit | 24 hours |
| Preemption notice | 30 seconds | 30 seconds |
| Discount | 60-91% off on-demand | 60-91% off on-demand |
| Availability | Subject to capacity | Subject to capacity |
| Live migration | No | No |
| Auto-restart | No | No |
gcloud flag |
--provisioning-model=SPOT |
--preemptible (deprecated) |
When to use Spot VMs: Batch processing, CI/CD builds, data analytics, fault-tolerant distributed workloads, rendering pipelines. Never use them for stateful, user-facing, or always-on services.
Exam trap: The
--preemptibleflag still works but is deprecated. The exam uses--provisioning-model=SPOT. Spot VMs have no 24-hour maximum -- that was a preemptible-only limitation.
Sole-Tenant Nodes
Sole-tenant nodes provide dedicated physical servers for your VMs. No other customer's VMs share the hardware.
When to use sole-tenant nodes:
- Licensing compliance: Bring-your-own-license (BYOL) software that requires per-core or per-socket licensing (e.g., Windows Server, Oracle Database).
- Regulatory/compliance: Workloads that require physical isolation from other tenants.
- Performance isolation: Workloads sensitive to noisy-neighbor effects.
Sole-tenant nodes use node templates (defining machine type family and node type) and node groups (collection of nodes). VMs are placed on sole-tenant nodes via node affinity labels.
Exam trap: Sole-tenant nodes provide physical isolation, not network isolation. Network isolation is handled by VPCs and firewalls. The exam may try to confuse the two.
Google Kubernetes Engine (GKE)
GKE offers two modes of operation:
| Feature | GKE Standard | GKE Autopilot |
|---|---|---|
| Node management | You manage nodes (machine types, scaling, upgrades) | Google manages nodes entirely |
| Billing | Pay for node VMs (whether utilized or not) | Pay per pod resource requests (vCPU, memory, ephemeral storage) |
| Node pools | You create and configure | Managed automatically |
| Security posture | You harden nodes | Hardened by default (no SSH, no privileged pods by default) |
| GPU/TPU support | Full support | Supported (with constraints) |
| Cluster autoscaler | You configure | Built-in, always on |
| SLA | 99.5% (zonal), 99.95% (regional) | 99.9% (regional control plane) |
| Best for | Full control, custom configurations, specialized hardware | Reduced operational overhead, predictable costs |
Cluster types:
| Type | Description | Exam Context |
|---|---|---|
| Zonal cluster | Single control plane in one zone; nodes in same zone | Lowest cost; not HA |
| Multi-zonal cluster | Single control plane in one zone; nodes across multiple zones | Node HA but single control plane SPOF |
| Regional cluster | Control plane replicas across 3 zones; nodes across zones | Production recommended; highest availability |
| Private cluster | Nodes have internal IPs only; optional private endpoint for control plane | Security-focused environments |
Node pools are groups of nodes within a cluster that share the same configuration (machine type, image, labels). Use multiple node pools to mix machine types (e.g., CPU pool + GPU pool) or run different OS images.
Exam trap: Autopilot clusters are always regional -- you cannot create a zonal Autopilot cluster. If the question says "minimize operational overhead" or "pay only for running pods," the answer is Autopilot.
Exam trap: In Standard mode, you pay for nodes even if pods are not utilizing them. In Autopilot, you pay for pod resource requests. For cost optimization on bursty workloads, Autopilot can be cheaper.
Cloud Run
Cloud Run is a fully managed serverless platform that runs stateless containers. It abstracts away all infrastructure.
Key configuration parameters:
| Parameter | Description | Default |
|---|---|---|
| Container image | Must listen on the port defined by $PORT env variable |
Required |
| CPU | 1, 2, 4, or 8 vCPUs | 1 |
| Memory | 128 MiB to 32 GiB | 512 MiB |
| Concurrency | Max simultaneous requests per container instance (1-1000) | 80 |
| Min instances | Minimum warm instances (avoid cold starts) | 0 |
| Max instances | Upper limit on auto-scaling | 100 |
| Timeout | Max request duration (up to 60 minutes) | 5 minutes |
| CPU allocation | Always allocated vs. only during request processing | Request-based |
Cloud Run vs. Cloud Functions decision matrix:
| Criteria | Cloud Run | Cloud Functions |
|---|---|---|
| Unit of deployment | Container image | Source code (function) |
| Runtime | Any language/framework in a container | Supported runtimes only (Node.js, Python, Go, Java, .NET, Ruby, PHP) |
| Concurrency | Multiple requests per instance | 1 request per instance (1st gen) / configurable (2nd gen) |
| Execution time | Up to 60 minutes | Up to 60 minutes (2nd gen) / 9 minutes (1st gen) |
| Best for | Microservices, APIs, web apps, custom runtimes | Event-driven functions, lightweight transformations, glue code |
Exam trap: If a question mentions "any programming language" or "custom binary/runtime," the answer is Cloud Run, not Cloud Functions. Cloud Functions requires a supported runtime.
Cloud Functions
Cloud Functions is Google Cloud's functions-as-a-service (FaaS) offering. The exam tests the differences between 1st gen and 2nd gen.
| Feature | 1st Gen | 2nd Gen |
|---|---|---|
| Runtime engine | Custom Google infrastructure | Built on Cloud Run |
| Concurrency | 1 request per instance | Up to 1,000 concurrent requests per instance |
| Max timeout | 9 minutes (HTTP), 9 minutes (event) | 60 minutes (HTTP), 9 minutes (event) |
| Min instances | Not supported | Supported (reduce cold starts) |
| Traffic splitting | Not supported | Supported (canary deployments) |
| Max instance size | 8 GB memory, 2 vCPUs | 16 GiB memory, 4 vCPUs |
| Event triggers | HTTP, Pub/Sub, Cloud Storage, Firestore, etc. | HTTP, Eventarc (CloudEvents format) |
| VPC connector | Supported | Supported |
Common trigger types for the exam:
- HTTP trigger: Invoked via HTTPS URL.
- Cloud Storage trigger: Fires on object finalize, delete, archive, metadata update.
- Pub/Sub trigger: Fires when a message is published to a topic.
- Firestore trigger: Fires on document create, update, delete, write.
- Eventarc trigger (2nd gen): Unifies event delivery from 90+ Google Cloud sources using CloudEvents standard.
Exam trap: 2nd gen Cloud Functions are built on Cloud Run infrastructure. If a question describes needing concurrency > 1 per instance, longer timeouts, or traffic splitting, the answer is 2nd gen (or Cloud Run directly).
2.3 Planning and Configuring Data Storage Options
Choosing the Right Database Service
| Service | Type | Data Model | Scaling | Transactions | Best For |
|---|---|---|---|---|---|
| Cloud SQL | Managed relational | MySQL, PostgreSQL, SQL Server | Vertical (read replicas for horizontal reads) | Full ACID | Traditional OLTP apps, CMS, ERP, < 64 TB |
| Cloud Spanner | Managed relational | Relational with horizontal scaling | Horizontal (automatic sharding) | Full ACID + global strong consistency | Global OLTP at scale, financial systems, gaming leaderboards |
| BigQuery | Serverless data warehouse | Columnar (SQL interface) | Fully managed, serverless | N/A (analytics, not OLTP) | OLAP, analytics, ad-hoc queries, data warehousing, petabyte-scale |
| Firestore | Managed NoSQL document | Document (collections/documents) | Automatic | ACID (within document groups) | Mobile/web apps, real-time sync, user profiles, game state |
| Cloud Bigtable | Managed NoSQL wide-column | Key-value / wide-column | Horizontal (add nodes) | Single-row only | IoT time-series, ad-tech, financial tick data, > 1 TB, high throughput |
| AlloyDB | Managed PostgreSQL-compatible | Relational (PostgreSQL) | Vertical + read pools | Full ACID | PostgreSQL workloads needing 4x faster analytics, AI/ML integration, hybrid transactional/analytical |
Decision flowchart for the exam:
- Need SQL and strong consistency at global scale? --> Cloud Spanner
- Need SQL for a traditional relational workload < 64 TB? --> Cloud SQL
- Need analytics / data warehouse / petabyte-scale queries? --> BigQuery
- Need NoSQL document store with mobile/web SDK and real-time sync? --> Firestore
- Need NoSQL for massive throughput on flat key-value data (> 1 TB)? --> Cloud Bigtable
Exam trap: Cloud SQL scales vertically (bigger instance) and uses read replicas for read scaling. It does NOT scale horizontally for writes. If a question says "horizontally scalable relational database," the answer is Cloud Spanner, not Cloud SQL.
Exam trap: BigQuery is for analytics (OLAP), not transactional workloads (OLTP). If the scenario describes reporting, dashboards, or ad-hoc queries on large datasets, it is BigQuery. If it describes user-facing transactions, it is not.
Exam trap: Bigtable requires a minimum of 1 node (unlike Firestore, which is serverless with no provisioning). Bigtable is not cost-effective for datasets under ~1 TB.
Cloud Storage Classes
Cloud Storage offers four storage classes, all with the same API, latency, and throughput. They differ in cost, minimum storage duration, and retrieval fees.
| Storage Class | Min Storage Duration | At-Rest Cost (relative) | Retrieval Cost | Use Case |
|---|---|---|---|---|
| Standard | None | Highest | None | Frequently accessed data, hot data, short-lived objects |
| Nearline | 30 days | ~50% of Standard | Per-GB retrieval fee | Data accessed less than once per month; backups |
| Coldline | 90 days | ~25% of Standard | Higher per-GB retrieval fee | Data accessed less than once per quarter; disaster recovery |
| Archive | 365 days | Lowest (~10% of Standard) | Highest per-GB retrieval fee | Data accessed less than once per year; regulatory archives |
Critical storage concepts for the exam:
- Minimum storage duration charge: If you delete or modify an object before the minimum storage duration, you are still charged for the full minimum period. Deleting a Coldline object after 10 days incurs 90 days of storage charges.
- Bucket-level class vs. object-level class: A bucket has a default storage class, but individual objects can have a different class.
- Autoclass: Automatically transitions objects between classes based on access patterns. Eliminates the need to manually set lifecycle rules for cost optimization.
- Lifecycle management: Rules that automatically transition storage class or delete objects based on age, creation date, number of versions, or storage class.
- Object versioning: Retains previous versions of objects when overwritten or deleted. Used for data protection and compliance. Each version is charged independently.
- Bucket locations: Regional, dual-region, or multi-region. Affects availability, latency, and cost.
Exam trap: All storage classes have the same low-latency access (milliseconds). Archive storage is NOT slow to retrieve -- it just has the highest per-GB retrieval cost. Do not confuse Cloud Storage Archive with tape-like cold storage.
Exam trap: Minimum storage duration charges apply even if you change the object's storage class. Moving an object from Coldline to Standard after 30 days incurs the remaining 60 days of Coldline charges.
Persistent Disks
Persistent disks are durable block storage for Compute Engine and GKE.
| Disk Type | IOPS (read/write) | Throughput | Use Case |
|---|---|---|---|
| pd-standard (Standard) | Low | Low | Boot disks, bulk data, cost-sensitive |
| pd-balanced | Moderate | Moderate | General workloads (best price-performance) |
| pd-ssd (SSD) | High | High | Databases, latency-sensitive apps |
| pd-extreme | Highest (configurable) | Highest | Enterprise databases (SAP HANA, Oracle) |
| Hyperdisk Balanced | High (configurable) | High | Databases requiring dynamic IOPS provisioning |
Zonal vs. Regional persistent disks:
| Feature | Zonal PD | Regional PD |
|---|---|---|
| Availability | Single zone | Synchronous replication across 2 zones in same region |
| Use case | Standard workloads | HA for stateful workloads, failover scenarios |
| Performance | Full performance | Slight write latency increase due to replication |
| Cost | Standard pricing | ~2x zonal pricing (two copies) |
Exam trap: Regional persistent disks replicate across exactly 2 zones (not 3). They are designed for HA, not for performance improvement. Write latency is higher because writes must commit to both zones.
Exam trap: Local SSDs provide the highest IOPS but are ephemeral -- data is lost when the VM stops or is preempted. They are not persistent disks. Do not choose local SSDs for durable storage.
2.4 Planning and Configuring Network Resources
Load Balancing
Google Cloud offers multiple load balancer types. The exam frequently tests which load balancer to use for a given scenario.
Load Balancer Decision Matrix:
| Load Balancer | Scope | Traffic Type | Proxy/Passthrough | Backend Types |
|---|---|---|---|---|
| Global external Application LB | Global | HTTP(S) | Proxy (Layer 7) | MIGs, NEGs, Cloud Run, Cloud Functions, GCS buckets |
| Regional external Application LB | Regional | HTTP(S) | Proxy (Layer 7) | MIGs, NEGs |
| Global external proxy Network LB | Global | TCP/SSL | Proxy (Layer 4) | MIGs, NEGs |
| Regional external passthrough Network LB | Regional | TCP/UDP | Passthrough (Layer 4) | MIGs, NEGs |
| Internal Application LB | Regional | HTTP(S) | Proxy (Layer 7) | MIGs, NEGs |
| Internal passthrough Network LB | Regional | TCP/UDP | Passthrough (Layer 4) | MIGs, NEGs |
| Cross-region internal Application LB | Global | HTTP(S) | Proxy (Layer 7) | MIGs, NEGs |
| Internal proxy Network LB | Regional | TCP | Proxy (Layer 4) | MIGs, NEGs |
Quick selection guide:
- External HTTP(S) traffic, global reach, CDN? --> Global external Application LB
- External TCP/SSL, global reach? --> Global external proxy Network LB
- External UDP or non-HTTP TCP, single region? --> Regional external passthrough Network LB
- Internal HTTP(S) microservices? --> Internal Application LB
- Internal TCP/UDP (e.g., database traffic)? --> Internal passthrough Network LB
Key terminology:
- Proxy load balancer: Terminates the client connection and opens a new connection to the backend. Can inspect and modify headers (Layer 7) or just forward (Layer 4).
- Passthrough load balancer: Forwards packets directly to backends without terminating the connection. Backend sees the original client IP.
- Backend service: Defines how the load balancer distributes traffic (health checks, session affinity, timeout).
- URL map: Routes requests to different backend services based on host and path rules (Application LBs only).
- Forwarding rule: Maps an external IP + port to the load balancer.
Exam trap: The global external Application Load Balancer can serve backends in multiple regions and route traffic to the nearest healthy backend. It is the only load balancer type that integrates with Cloud CDN. If the question mentions "CDN" or "content caching," this is the answer.
Exam trap: The internal passthrough Network LB preserves the original client IP because it is a passthrough (Layer 4) load balancer. The internal Application LB is a proxy and does NOT preserve the original client IP by default (it is in
X-Forwarded-For).
Resource Locations: Regions, Zones, and Multi-Region
Understanding Google Cloud's geography is essential for planning.
| Concept | Definition | Example |
|---|---|---|
| Region | Independent geographic area with 3+ zones | us-central1, europe-west1, asia-east1 |
| Zone | Isolated deployment area within a region | us-central1-a, us-central1-b |
| Multi-region | Large geographic area containing 2+ regions | US, EU, ASIA |
Planning guidelines:
- Latency: Place resources close to users. Use
gcloud compute regions listto see available regions. - High availability: Distribute across zones (within a region) for zone-level failure tolerance. Distribute across regions for regional failure tolerance.
- Data residency: Some regulations require data to stay within specific geographic boundaries. Choose regions accordingly.
- Service availability: Not all services or machine types are available in every region. Verify before committing.
- Cost: Pricing varies by region.
us-central1andus-east1are typically the cheapest US regions. Inter-region traffic incurs egress charges.
Exam trap: A zone failure takes down all resources in that zone. A regional managed instance group (or regional GKE cluster) survives zone failures because instances are spread across zones. A zonal MIG or zonal GKE cluster does not survive zone failures.
Network Service Tiers: Premium vs. Standard
Google Cloud offers two Network Service Tiers that affect how traffic is routed between users and Google Cloud resources.
| Feature | Premium Tier | Standard Tier |
|---|---|---|
| Routing | Traffic enters/exits Google's network at the edge closest to the user (cold-potato routing) | Traffic enters/exits at the edge closest to the Google Cloud region (hot-potato routing) |
| Performance | Lower latency, higher throughput, more reliable | Higher latency, less consistent |
| Global load balancing | Supported (single anycast IP) | Not supported (regional IPs only) |
| SLA | Google's global backbone SLA | No network performance SLA |
| Cost | Higher | Lower |
| Default | Yes (project default) | Must explicitly select |
When to use each:
- Premium Tier: Production workloads, global users, latency-sensitive applications, any workload requiring global load balancing.
- Standard Tier: Cost-sensitive workloads with regional users, batch processing, workloads tolerant of higher latency, dev/test.
Exam trap: Global external Application Load Balancer requires Premium Tier. If you switch to Standard Tier, you can only use regional load balancers. Any question mentioning "global load balancer" implies Premium Tier.
Exam trap: Premium Tier is the default tier. You do not need to do anything to use it. Standard Tier must be explicitly configured per resource.
Exam Strategy for Domain 2
- Know the decision trees: Most Domain 2 questions are "which service should you use?" Pick the service based on the workload description, not just the technology name.
- Cost optimization signals: If the question mentions cost savings, look for Spot VMs, Autopilot GKE, committed use discounts, appropriate storage classes, and Standard network tier.
- Scale signals: "Horizontally scalable relational" = Spanner. "Petabyte analytics" = BigQuery. "Global HTTP" = Global Application LB. "Millions of writes per second" = Bigtable.
- Duration signals: Minimum storage durations (30/90/365 days) are heavily tested for Cloud Storage classes.
- Managed vs. control signals: "Minimize operational overhead" = Autopilot, Cloud Run, serverless. "Full control over nodes" = GKE Standard. "Custom runtime" = Cloud Run.
References
- Machine families resource and comparison guide -- Compute Engine machine type families, series, and specifications.
- Spot VMs overview -- Spot VM pricing, behavior, and constraints.
- Sole-tenant nodes overview -- Dedicated physical hardware for compliance and licensing.
- GKE Autopilot overview -- Autopilot mode vs. Standard mode comparison.
- Cloud Run overview -- Serverless container platform configuration.
- Cloud Functions concepts -- 1st gen vs. 2nd gen functions.
- Cloud SQL overview -- Managed relational database service.
- Cloud Spanner overview -- Globally distributed relational database.
- BigQuery introduction -- Serverless data warehouse.
- Firestore overview -- NoSQL document database.
- Cloud Bigtable overview -- Wide-column NoSQL for high-throughput workloads.
- Cloud Storage classes -- Storage class comparison and lifecycle management.
- Persistent disk overview -- Block storage types and regional replication.
- Load balancing overview -- All load balancer types and selection guidance.
- Network Service Tiers -- Premium vs. Standard tier routing and capabilities.
- Google Cloud Pricing Calculator -- Cost estimation tool.
- ACE certification exam guide -- Official exam objectives.