Reference

Domain 1: Designing and Planning a Cloud Solution Architecture (~25%)

Domain 1 is the largest domain on the Professional Cloud Architect exam, accounting for roughly 25% of the questions (approximately 13-15 out of 50-60). This domain tests your ability to translate business requirements into technical cloud architectures, select the right GCP services for a given scenario, design migration plans, and plan for ongoing modernization. Unlike the ACE exam, which focuses on implementation, the PCA exam demands architectural decision-making -- you must justify why a particular service or pattern is the right choice, not just how to configure it.

1.1 Meeting Business Requirements and Strategy

The PCA exam presents scenario-based questions where a business describes its goals, and you must choose the architecture that best aligns with those goals. The exam tests your ability to balance cost, performance, reliability, compliance, and time-to-market.

Business Use Cases and Product Strategy

Exam scenarios typically fall into one of these patterns:

Business Scenario	Architectural Direction
Startup with unpredictable traffic	Serverless (Cloud Run, Cloud Functions), autoscaling MIGs
Enterprise with strict compliance (HIPAA, PCI-DSS)	Dedicated resources, VPC Service Controls, Cloud HSM, regionalized data
Global consumer-facing application	Multi-region deployment, global load balancing, Cloud CDN
Cost-constrained batch processing	Spot VMs, preemptible VMs, Dataflow, Cloud Batch
Real-time analytics pipeline	Pub/Sub + Dataflow + BigQuery streaming
Legacy application migration	Rehost to Compute Engine or VMware Engine, then modernize

Exam trap: The exam frequently presents scenarios where the "most technically elegant" solution is not the right answer. A startup that needs to launch in 2 weeks should rehost to Compute Engine, not spend months re-architecting for GKE. Always match the solution to the business constraint -- budget, timeline, team expertise, and compliance requirements.

Cost Optimization Strategies

Cost optimization is a dominant theme across the entire PCA exam. You must know the full discount hierarchy and when each applies.

Discount Types Comparison

Discount Type	Savings	Commitment	Applies To	Key Detail
Sustained Use Discounts (SUDs)	Up to 30%	None (automatic)	N1 and sole-tenant nodes only	Applied automatically for usage above 25% of a billing month
Resource-based CUDs	Up to 55% (general) / 70% (memory-optimized)	1 or 3 years	vCPUs, memory, GPUs, local SSD, sole-tenant nodes	Scoped to a specific region and project
Compute Flexible CUDs	Up to 46% (general) / 63% (memory-optimized)	1 or 3 years	Compute Engine, GKE, Cloud Run	Applies across projects and regions within a billing account
Spot VMs	Up to 91%	None	Fault-tolerant workloads only	Can be preempted at any time; no SLA; no maximum runtime (unlike preemptible VMs)
Preemptible VMs (legacy)	Up to 91%	None	Fault-tolerant workloads only	24-hour maximum runtime; Google recommends Spot VMs instead

Discount priority order: CUDs take precedence over SUDs. A resource covered by a CUD does not also receive SUDs. Spot/preemptible VMs are not eligible for SUDs or CUDs.

Exam trap: SUDs apply automatically -- you do not need to purchase them. They are only available for N1 and sole-tenant nodes. N2, N2D, E2, C2, C3, T2D, and Tau machine families are NOT eligible for SUDs. If the exam asks about automatic discounts for an E2 or N2 instance, the answer is "none" -- use CUDs instead. E2 instances are eligible for Flexible CUDs.

Additional Cost Optimization Techniques

Technique	Description
Custom machine types	Size vCPU and memory independently to avoid paying for unused resources. Available for N1, N2, N2D, E2 series. Carries a ~5% premium over predefined types but saves money when predefined types waste resources.
Autoscaling	MIGs scale in/out based on CPU, HTTP load, or custom metrics. Scale-in reduces cost during low traffic.
Serverless pricing	Cloud Functions, Cloud Run, and App Engine (standard) charge per invocation or per request-second with scale-to-zero -- no baseline cost when idle.
Cloud Storage classes	Use Nearline (30-day minimum), Coldline (90-day), or Archive (365-day) for infrequently accessed data. Object Lifecycle Management automates transitions.
Right-sizing recommendations	Cloud Monitoring + Recommender API identify oversized VMs and suggest smaller machine types.
BigQuery slots	On-demand pricing (per-query, per TB scanned) vs. flat-rate reservations for predictable analytical workloads.

TCO Analysis and ROI

The exam tests whether you understand Total Cost of Ownership beyond just compute pricing:

Direct costs: Compute, storage, networking (egress), licensing
Indirect costs: Operational overhead, staffing, training, downtime
Migration costs: One-time costs for assessment, refactoring, data transfer, parallel running
Ongoing optimization: Cost of continuous monitoring, right-sizing, and re-architecting

The Google Cloud Pricing Calculator is the primary tool for estimating costs. Migration Center provides TCO comparisons between on-premises and cloud.

Compliance and Observability

For compliance-heavy scenarios:

VPC Service Controls: Create security perimeters around GCP resources to prevent data exfiltration
Organization Policy Service: Enforce constraints (e.g., restrict resource locations to specific regions)
Cloud Audit Logs: Admin Activity logs (always on), Data Access logs (configurable), System Event logs
Access Transparency: Logs of Google staff accessing your data (for compliance audits)
Assured Workloads: Preconfigured compliance environments for FedRAMP, HIPAA, CJIS, etc.

1.2 Defining Technical Requirements

High Availability Design Patterns

HA is a core PCA topic. You must understand the availability implications of different deployment topologies.

Pattern	Availability Target	Description	Example
Single zone	~99.9%	One zone, one region	Dev/test environments
Multi-zone (within a region)	~99.99%	Resources spread across 3 zones in one region	Regional MIG behind a regional load balancer
Multi-region	~99.999%	Resources in 2+ regions with global load balancing	Global HTTP(S) LB with backend services in us-central1 and europe-west1
Hybrid/multi-cloud	Varies	Workloads across on-premises and GCP or multiple clouds	Anthos, GKE Enterprise

Exam trap: Multi-zone is NOT the same as multi-region. A regional MIG spans multiple zones within a single region, giving you zone-level resilience. For region-level failures (extremely rare but testable), you need multi-region deployment with global load balancing.

Load Balancer Selection Guide

The PCA exam heavily tests load balancer selection. You must match the scenario to the correct load balancer type.

Load Balancer	Layer	Scope	Traffic Type	Key Feature
Global External Application LB	7	Global	HTTP/HTTPS	Anycast IP, Cloud CDN, Cloud Armor, URL maps, traffic splitting
Regional External Application LB	7	Regional	HTTP/HTTPS	Single region; Envoy-based; advanced traffic management
Regional Internal Application LB	7	Regional	HTTP/HTTPS	Internal clients only; microservices traffic routing
Cross-Region Internal Application LB	7	Global	HTTP/HTTPS	Internal clients across multiple regions
Global External Proxy Network LB	4	Global	TCP with optional SSL offload	Non-HTTP TCP traffic needing global reach (e.g., gaming, IoT)
Regional External Passthrough Network LB	4	Regional	TCP/UDP/ESP/GRE/ICMP	Preserves client source IP; direct server return; highest performance
Regional Internal Passthrough Network LB	4	Regional	TCP/UDP	Internal TCP/UDP load balancing (e.g., internal database tier)

Decision tree for the exam:

Is it HTTP/HTTPS traffic? --> Application Load Balancer (Layer 7)
Is it internal-only? --> Internal variant
Does it need global reach? --> Global variant (requires Premium Network Tier)
Is it non-HTTP TCP/UDP? --> Network Load Balancer (Layer 4)
Do you need to preserve client source IP for non-HTTP? --> Passthrough Network LB
Do you need SSL offload for non-HTTP TCP? --> Proxy Network LB

Exam trap: Cloud Armor DDoS protection and Cloud CDN integration are only available with the Global External Application Load Balancer. If a scenario requires WAF rules or DDoS protection, the answer is always this load balancer type. Internal load balancers do NOT support Cloud Armor.

Autoscaling

Managed Instance Groups (MIGs)

Managed Instance Groups are the primary compute scaling mechanism:

Feature	Description
Autoscaling signals	CPU utilization, HTTP load balancing serving capacity, Cloud Monitoring metrics, schedules
Cool-down period	Time after instance creation before autoscaler collects metrics (prevents flapping)
Scale-in controls	Limit how quickly the group can shrink (prevent aggressive scale-in)
Regional MIG	Distributes instances across multiple zones for HA
Stateful MIG	Preserves instance names, disks, and metadata across recreation events
Update policies	Rolling update, canary update, proactive/opportunistic replacement

GKE Autoscaling

Autoscaler	Scope	Scales What	Signal
Horizontal Pod Autoscaler (HPA)	Pod	Number of pod replicas	CPU, memory, custom metrics, external metrics
Vertical Pod Autoscaler (VPA)	Pod	CPU/memory requests per pod	Historical resource usage
Cluster Autoscaler	Node	Number of nodes in a node pool	Pending pods (unschedulable due to insufficient resources)
Multidimensional Pod Autoscaler (MPA)	Pod	Both replicas and resources	Combined HPA + VPA signals
Node Auto-Provisioning (NAP)	Node pool	Creates/deletes entire node pools	Workload requirements (machine type, GPU, etc.)

Exam trap: HPA and VPA should not be used together on the same metric (e.g., both scaling on CPU). They will conflict. Use MPA if you need both horizontal and vertical scaling.

Serverless Services Comparison

Feature	Cloud Functions	Cloud Run	App Engine Standard	App Engine Flexible
Unit of deployment	Function	Container	Application version	Application version (custom runtime)
Scale to zero	Yes	Yes	Yes	No (minimum 1 instance)
Max request timeout	9 min (1st gen) / 60 min (2nd gen, HTTP-triggered only; 9 min event-driven)	60 min	10 min (auto) / 60 min (manual scaling)	60 min
Concurrency	1 (1st gen) / up to 1000 (2nd gen)	Up to 1000 per instance	Varies by runtime	Configurable
Custom runtime	No (specific runtimes)	Yes (any container)	No (supported runtimes)	Yes (Dockerfile)
VPC access	Serverless VPC Access connector	Direct VPC egress or connector	Connector	Native VPC
Pricing	Per invocation + compute time	Per request + vCPU/memory-seconds	Per instance-hour	Per VM-hour
Best for	Event-driven functions, webhooks	Containerized web apps, APIs, microservices	Simple web apps, APIs	Legacy apps needing custom runtimes

Exam trap: Cloud Run is the recommended default serverless platform for new workloads. Cloud Functions is for event-driven glue code (Pub/Sub triggers, Cloud Storage triggers). App Engine Standard is legacy but still tested. App Engine Flexible does NOT scale to zero -- if cost optimization is the priority and the workload is bursty, App Engine Flexible is the wrong answer.

Google Cloud Well-Architected Framework

The Well-Architected Framework consists of six pillars. The PCA exam tests your understanding of design principles from each pillar.

Pillar	Focus	Key Principles
Operational Excellence	Efficient deployment, monitoring, management	CloudOps, incident management, automated change management, continuous improvement
Security, Privacy, and Compliance	Data protection, zero trust, regulatory alignment	Zero trust architecture, shift-left security, shared responsibility model, preemptive cyber defense
Reliability	Resilient, highly available workloads	Redundancy, horizontal scalability, graceful degradation, failure detection via observability, postmortems
Cost Optimization	Maximize business value of cloud spending	Align spend with business objectives, cost awareness culture, resource utilization optimization
Performance Optimization	Optimal resource performance and tuning	Elasticity, modular design, continuous monitoring, right-sizing
Sustainability	Environmentally responsible workloads	Low-carbon regions, energy-efficient software, optimized storage, resource usage patterns

Five foundational design principles across all pillars:

Design for change -- Small, frequent deployments with rapid feedback loops
Document your architecture -- Link documentation to design decisions
Simplify and use managed services -- Reduce operational burden
Decouple architecture -- Separate components for independent operation
Use stateless architecture -- Improve scalability via shared storage and caching

1.3 Choosing GCP Network, Storage, and Compute Resources

Storage and Database Decision Tree

Choosing the right storage service is one of the most frequently tested areas. Use this decision matrix.

Relational Databases

Service	Scale Model	Max Capacity	Global Distribution	Use Case	Key Differentiator
Cloud SQL	Vertical (read replicas for read scale)	96 vCPUs, 624 GB RAM (Enterprise); 128 vCPUs, 864 GB RAM (Enterprise Plus)	Cross-region read replicas	Standard OLTP, web apps, CMS	Managed MySQL, PostgreSQL, SQL Server; lowest operational overhead for relational
AlloyDB	Vertical (read pool)	Column-engine acceleration	Cross-region replication	PostgreSQL workloads needing high OLTP throughput and analytics	PostgreSQL-compatible with columnar engine for accelerated analytical queries; high transaction throughput
Cloud Spanner	Horizontal (automatic sharding)	Virtually unlimited	Multi-region with strong consistency	Global OLTP, financial systems, inventory	Only globally distributed relational database with external consistency; 99.999% SLA (multi-region)

Exam trap: If the scenario requires global strong consistency with a relational database, the answer is always Spanner. Cloud SQL cannot provide multi-region strong consistency. AlloyDB is PostgreSQL-compatible but does not offer Spanner's horizontal scaling or global distribution.

NoSQL Databases

Service	Data Model	Scale Model	Consistency	Use Case
Firestore	Document (JSON-like)	Serverless, automatic	Strong (within entity group)	Mobile/web apps, real-time sync, user profiles, content management
Bigtable	Wide-column	Horizontal (nodes)	Eventually consistent (single-row strong)	Time-series, IoT, analytics, AdTech, 10ms latency at scale
Memorystore	Key-value (in-memory)	Vertical	Strong	Session caching, leaderboards, real-time counters

Exam trap: Bigtable is NOT a good choice for data smaller than 1 TB -- the minimum node count makes it expensive at small scale. For small-to-medium NoSQL workloads, Firestore is the right answer. Bigtable excels at high-throughput, low-latency reads/writes at massive scale (petabytes).

Analytics and Data Warehousing

Service	Type	Best For	Pricing Model
BigQuery	Serverless data warehouse	OLAP, BI, ad-hoc SQL analytics on petabyte-scale data	On-demand (per TB scanned) or flat-rate reservations (slots)

BigQuery is the default answer for analytics workloads. Key features: federated queries (query data in Cloud Storage, Bigtable, or external sources without loading), materialized views, ML (BigQuery ML), streaming inserts, and partitioning/clustering for cost optimization.

Object and File Storage

Service	Type	Use Case	Storage Classes
Cloud Storage	Object store	Unstructured data, backups, media, data lake	Standard, Nearline (30-day min), Coldline (90-day min), Archive (365-day min)
Filestore	Managed NFS	Shared file systems for Compute Engine and GKE	Basic HDD, Basic SSD, Zonal, Regional, Enterprise
Persistent Disk	Block storage	Boot disks, database disks	Standard (HDD), Balanced (SSD), SSD, Extreme

Compute Decision Tree

Workload	Recommended Service	Why
Event-driven function, webhook	Cloud Functions	Single-purpose, auto-scales, event triggers
Containerized web app or API	Cloud Run	Any container, scale to zero, per-request pricing
Simple web app (supported runtimes)	App Engine Standard	Managed platform, auto-scaling, zero ops
Custom runtime web app	App Engine Flexible or Cloud Run	Dockerfile support; Cloud Run preferred for new workloads
Microservices with service mesh	GKE	Kubernetes orchestration, Istio/Anthos Service Mesh
Hybrid/multi-cloud Kubernetes	GKE Enterprise (Anthos)	Consistent Kubernetes management across GCP, on-prem, AWS, Azure; fleet management, Config Sync, Policy Controller
Lift-and-shift VM workloads	Compute Engine	Full VM control, any OS
High-performance computing	Compute Engine (C2/C2D/H3)	Compute-optimized machine types
ML training	Vertex AI with GPUs/TPUs	Managed ML platform, A2/A3 accelerator VMs, TPU pods
Large in-memory databases (SAP HANA)	Compute Engine (M3)	Memory-optimized, up to 12 TB RAM
Windows workloads or VMware migration	Compute Engine or VMware Engine	Windows licensing; VMware Engine for vSphere compatibility
Batch jobs tolerating interruption	Compute Engine (Spot VMs) or Cloud Batch	Up to 91% discount; Cloud Batch manages job queuing

Machine Family Quick Reference

Family	Series	Best For	Key Spec
General-purpose	E2, N1, N2, N2D, T2D, C3	Web servers, app servers, dev/test, small-medium databases	E2: lowest cost; N2: best price-performance; C3: newest Intel
Compute-optimized	C2, C2D, H3	HPC, gaming servers, batch processing, scientific computing	H3: 88 vCPUs, DDR5, Intel Sapphire Rapids
Memory-optimized	M1, M2, M3	SAP HANA, in-memory databases, real-time analytics	M3: up to 12 TB RAM
Accelerator-optimized	A2, A3, G2	ML training/inference, video transcoding, rendering	A3: latest NVIDIA GPUs for LLM training

Networking Services

VPC Fundamentals

A VPC network is a global resource containing regional subnets. Key concepts:

Concept	Description	Exam Relevance
Auto-mode network	Automatically creates one subnet per region with predefined IP ranges	Quick setup; not recommended for production (IP ranges may conflict)
Custom-mode network	You define subnets and IP ranges manually	Production standard; full control over IP addressing
Firewall rules	Distributed virtual firewall; default denies all ingress, allows all egress	Implied rules; priority-based (0-65535, lower number = higher priority)
VPC Peering	Connects two VPC networks; non-transitive	No overlapping IP ranges; each peering is point-to-point
Shared VPC	Centralizes networking in a host project; service projects attach to it	Best practice for multi-project environments; centralizes firewall and subnet management
VPC Service Controls	Creates security perimeters around GCP services	Prevents data exfiltration; restricts API access to a perimeter
Private Google Access	VMs without external IPs can reach Google APIs	Must be enabled per subnet
Cloud NAT	Outbound NAT for VMs without external IPs	Regional; no inbound NAT

Exam trap: VPC Peering is NOT transitive. If VPC-A peers with VPC-B, and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B. For transitive connectivity, use a hub-and-spoke model with Network Connectivity Center or a Shared VPC. Network Connectivity Center is the answer when the exam describes multi-VPC or multi-cloud architectures needing transitive routing through a central hub.

Hybrid Connectivity

Service	Bandwidth	SLA	Use Case	Requirement
Cloud VPN (HA VPN)	Up to 3 Gbps per tunnel	99.99% (with proper topology)	Lower bandwidth, encrypted connectivity; initial migration	IPsec; no colocation needed
Dedicated Interconnect	10 Gbps or 100 Gbps circuits (up to 200 Gbps)	Google end-to-end SLA	High-bandwidth, low-latency enterprise connectivity	Requires colocation facility equipment
Partner Interconnect	50 Mbps to 50 Gbps	Provider-dependent	No colocation access; leveraging existing provider	Service provider relationship
Cloud Router	N/A (control plane)	N/A	Dynamic routing via BGP	Used with Cloud VPN and Interconnect

Decision criteria:

Need < 3 Gbps and can tolerate internet-based latency? --> Cloud VPN
Need > 10 Gbps and have colocation access? --> Dedicated Interconnect
Need > 3 Gbps but no colocation? --> Partner Interconnect
Need encrypted connectivity over Interconnect? --> Use VPN over Interconnect (HA VPN tunnels over Dedicated/Partner Interconnect)

Exam trap: Cloud VPN traffic is encrypted (IPsec). Dedicated and Partner Interconnect traffic is NOT encrypted by default -- it traverses Google's network but is not IPsec-encrypted. If the scenario requires encryption over Interconnect, the answer is to layer HA VPN tunnels over the Interconnect connection.

Additional Networking Services

Service	Purpose
Cloud DNS	Managed authoritative DNS; supports public and private zones; DNSSEC support
Cloud Armor	WAF and DDoS protection; works with external HTTP(S) load balancers and external proxy network load balancers
Cloud CDN	Content caching at Google edge locations; works with Global External Application LB
Network Connectivity Center	Hub-and-spoke network topology; provides transitive routing between VPCs, VPN tunnels, and Interconnect attachments
Traffic Director	Managed control plane for service mesh (Envoy-based); global load balancing for internal microservices
Network Intelligence Center	Network monitoring, topology visualization, connectivity tests

1.4 Designing a Migration Plan

Migration Phases

Google Cloud defines a four-phase migration framework:

Phase	Activities	Key Tools
Assess	Inventory applications, map dependencies, evaluate TCO, assess team readiness	Migration Center, StratoZone, manual discovery
Plan	Design cloud foundation (landing zone), prioritize workloads, define migration waves	Cloud Foundation Toolkit, Terraform, resource hierarchy design
Deploy	Execute migration, transfer data, validate functionality	Migrate to Virtual Machines, Database Migration Service, Storage Transfer Service
Optimize	Right-size resources, enable autoscaling, adopt managed services, improve security posture	Recommender, Cloud Monitoring, cost management tools

Migration Approaches

Approach	Description	When to Use	Speed	Optimization
Rehost (lift-and-shift)	Move workloads as-is with minimal changes	Tight timeline, legacy apps, risk-averse orgs	Fastest	Lowest (cloud benefits not leveraged)
Replatform (lift-and-optimize)	Move and make targeted cloud optimizations	Moderate timeline, desire for some cloud benefits	Moderate	Moderate
Refactor (move-and-improve)	Modify application code to leverage cloud-native capabilities	Budget and time available, performance improvements needed	Slow	High
Re-architect	Fundamental restructure (e.g., monolith to microservices)	Application needs major scalability or agility improvements	Slowest	Highest
Rebuild	Complete rewrite as cloud-native	Existing app unmaintainable or does not meet goals	Slowest	Highest
Repurchase	Switch to SaaS equivalent	On-premises software has a suitable SaaS replacement	Varies	N/A (different product)

Exam trap: The exam loves scenarios where a company says "we want to move to the cloud as fast as possible" -- the answer is almost always rehost first, then modernize later. Re-architecting is the right long-term play but never the fastest path. Conversely, if the scenario emphasizes "we want to take full advantage of cloud-native capabilities," rehosting is the wrong answer.

Migration Tools

Tool	Migrates What	Key Feature
Migrate to Virtual Machines	Physical/virtual servers to Compute Engine	Streaming replication, minimal downtime cutover
VMware Engine	VMware workloads to a managed vSphere environment	Full VMware stack (vCenter, vSAN, NSX-T) on Google Cloud; no application changes
Database Migration Service (DMS)	MySQL, PostgreSQL, SQL Server, Oracle to Cloud SQL, AlloyDB, or as-is	Continuous replication, minimal downtime
BigQuery Migration Service	Data warehouses (Teradata, Redshift, etc.) to BigQuery	SQL translation, schema migration
Storage Transfer Service	Data from other clouds (S3, Azure Blob), HTTP sources, or on-premises to Cloud Storage	Scheduled transfers, bandwidth control
Transfer Appliance	Massive data volumes (hundreds of TB to 1 PB) to Cloud Storage	Physical appliance shipped to your datacenter; for when network transfer is impractical
gsutil / gcloud storage	Files and objects to Cloud Storage	CLI-based; parallel uploads; resumable transfers
Migration Center	Assessment and planning	Discovery, dependency mapping, TCO analysis, fit assessment

Dependency and License Analysis

Before migration, you must understand:

Application dependencies: Which applications talk to which? Map with Migration Center or manual discovery.
Database dependencies: Which apps share databases? Foreign key relationships across schemas?
Network dependencies: What ports and protocols are required? Firewall rule translation?
License portability: Can existing licenses (Oracle, SQL Server, Windows) be brought to GCP? Do you need new cloud licenses? Sole-tenant nodes may be required for bring-your-own-license (BYOL) scenarios where license terms require physical host isolation.

Exam trap: Oracle Database licensing on Google Cloud often requires sole-tenant nodes because Oracle licenses per physical core, not per vCPU. If the exam describes an Oracle Database migration, sole-tenant nodes are likely part of the correct answer.

1.5 Planning for Future Improvements

Cloud Modernization Journey

The PCA exam tests your understanding of the modernization progression:

VMs (Compute Engine)
  --> Containers (GKE)
    --> Managed Containers (Cloud Run)
      --> Microservices + Service Mesh (GKE + Anthos Service Mesh)
        --> Event-Driven / Serverless (Cloud Functions + Pub/Sub + Eventarc)

Each step increases cloud-native optimization but also increases refactoring effort:

Stage	Deployment Model	Scaling	Ops Overhead	Cost Model
VMs	Compute Engine MIGs	Autoscaler (minutes)	Highest (OS patching, etc.)	Per-hour/second
Containers on GKE	Kubernetes pods	HPA/VPA/Cluster Autoscaler (seconds)	Medium (cluster management)	Per-node + overhead
Cloud Run	Managed containers	Request-based (seconds, to zero)	Lowest	Per-request
Serverless Functions	Cloud Functions	Invocation-based (milliseconds)	Lowest	Per-invocation

Integration with AI/ML via Vertex AI

Vertex AI is Google Cloud's unified ML platform. The PCA exam tests high-level architectural decisions:

Component	Purpose	When to Use
Vertex AI Workbench	Managed Jupyter notebooks	Data exploration, prototyping
Vertex AI Training	Custom model training	Custom ML models needing GPU/TPU
Vertex AI Prediction	Online and batch prediction endpoints	Serving trained models
AutoML	No-code model training	When data scientists are unavailable; tabular, image, text, video
Vertex AI Pipelines	ML workflow orchestration (Kubeflow/TFX)	Reproducible, automated ML pipelines
Gemini Cloud Assist	AI-powered assistance for cloud operations	Troubleshooting, code generation, architecture recommendations
Model Garden	Pre-trained foundation models	Using Google and open-source LLMs

Exam trap: The exam may present scenarios where a company wants ML capabilities but has no data science team. The answer is typically AutoML (no-code) or pre-trained APIs (Vision AI, Natural Language AI, Translation AI), not custom Vertex AI Training.

Data Mesh and BigQuery Federation

For data-heavy architectures, the PCA exam tests:

BigQuery federated queries: Query data in Cloud Storage (Parquet, ORC, Avro, CSV), Bigtable, or Cloud SQL without loading it into BigQuery. Trade-off: higher query latency but no ETL pipeline needed.
BigQuery Omni: Run BigQuery analytics on data stored in AWS S3 or Azure Blob Storage without moving it.
Analytics Hub: Share BigQuery datasets across organizations with governed access.
Dataplex: Data governance and management across data lakes and data warehouses; auto-discovery, metadata management, data quality.
Data mesh principles: Domain-oriented decentralized data ownership, data as a product, self-serve data infrastructure, federated computational governance.

Exam Strategy for Domain 1

Question Patterns

"Which service should you use?" -- Map the requirements (scale, consistency, latency, cost) to the correct service using the decision trees above.
"How should you minimize cost?" -- Apply the discount hierarchy: Spot VMs > CUDs > SUDs > right-sizing > serverless.
"How should you design for high availability?" -- Match the availability requirement to multi-zone, multi-region, or hybrid.
"What migration approach should you use?" -- Match the business constraint (speed, budget, team skill) to rehost/replatform/refactor.
"Which load balancer?" -- Follow the decision tree: HTTP vs. TCP, internal vs. external, global vs. regional.

Common Exam Traps Summary

Trap	Correct Answer
E2 instances get automatic SUDs	No -- E2 is NOT eligible for SUDs
VPC peering is transitive	No -- each peering is point-to-point; not transitive
Interconnect traffic is encrypted	No -- you must layer HA VPN for encryption
App Engine Flexible scales to zero	No -- minimum 1 instance; use Cloud Run for scale-to-zero
Bigtable for small datasets (< 1 TB)	No -- Firestore for small-medium NoSQL; Bigtable for large-scale
Cloud SQL for global strong consistency	No -- that is Cloud Spanner
Rehost is the most cloud-optimized approach	No -- it is the fastest but least optimized; refactor/re-architect for full optimization
HPA and VPA can target the same metric	No -- they conflict; use MPA for combined scaling
Cloud Armor works with internal load balancers	No -- Cloud Armor works with external HTTP(S) LBs and external proxy network LBs only
Spot VMs have a 24-hour limit	No -- that is preemptible VMs (legacy); Spot VMs have no time limit