Reference

Domain 1: Designing and Planning a Cloud Solution Architecture (~25%)

Domain 1 is the largest domain on the Professional Cloud Architect exam, accounting for roughly 25% of the questions (approximately 13-15 out of 50-60). This domain tests your ability to translate business requirements into technical cloud architectures, select the right GCP services for a given scenario, design migration plans, and plan for ongoing modernization. Unlike the ACE exam, which focuses on implementation, the PCA exam demands architectural decision-making -- you must justify why a particular service or pattern is the right choice, not just how to configure it.


1.1 Meeting Business Requirements and Strategy

The PCA exam presents scenario-based questions where a business describes its goals, and you must choose the architecture that best aligns with those goals. The exam tests your ability to balance cost, performance, reliability, compliance, and time-to-market.

Business Use Cases and Product Strategy

Exam scenarios typically fall into one of these patterns:

Business Scenario Architectural Direction
Startup with unpredictable traffic Serverless (Cloud Run, Cloud Functions), autoscaling MIGs
Enterprise with strict compliance (HIPAA, PCI-DSS) Dedicated resources, VPC Service Controls, Cloud HSM, regionalized data
Global consumer-facing application Multi-region deployment, global load balancing, Cloud CDN
Cost-constrained batch processing Spot VMs, preemptible VMs, Dataflow, Cloud Batch
Real-time analytics pipeline Pub/Sub + Dataflow + BigQuery streaming
Legacy application migration Rehost to Compute Engine or VMware Engine, then modernize

Exam trap: The exam frequently presents scenarios where the "most technically elegant" solution is not the right answer. A startup that needs to launch in 2 weeks should rehost to Compute Engine, not spend months re-architecting for GKE. Always match the solution to the business constraint -- budget, timeline, team expertise, and compliance requirements.

Cost Optimization Strategies

Cost optimization is a dominant theme across the entire PCA exam. You must know the full discount hierarchy and when each applies.

Discount Types Comparison

Discount Type Savings Commitment Applies To Key Detail
Sustained Use Discounts (SUDs) Up to 30% None (automatic) N1 and sole-tenant nodes only Applied automatically for usage above 25% of a billing month
Resource-based CUDs Up to 55% (general) / 70% (memory-optimized) 1 or 3 years vCPUs, memory, GPUs, local SSD, sole-tenant nodes Scoped to a specific region and project
Compute Flexible CUDs Up to 46% (general) / 63% (memory-optimized) 1 or 3 years Compute Engine, GKE, Cloud Run Applies across projects and regions within a billing account
Spot VMs Up to 91% None Fault-tolerant workloads only Can be preempted at any time; no SLA; no maximum runtime (unlike preemptible VMs)
Preemptible VMs (legacy) Up to 91% None Fault-tolerant workloads only 24-hour maximum runtime; Google recommends Spot VMs instead

Discount priority order: CUDs take precedence over SUDs. A resource covered by a CUD does not also receive SUDs. Spot/preemptible VMs are not eligible for SUDs or CUDs.

Exam trap: SUDs apply automatically -- you do not need to purchase them. They are only available for N1 and sole-tenant nodes. N2, N2D, E2, C2, C3, T2D, and Tau machine families are NOT eligible for SUDs. If the exam asks about automatic discounts for an E2 or N2 instance, the answer is "none" -- use CUDs instead. E2 instances are eligible for Flexible CUDs.

Additional Cost Optimization Techniques

Technique Description
Custom machine types Size vCPU and memory independently to avoid paying for unused resources. Available for N1, N2, N2D, E2 series. Carries a ~5% premium over predefined types but saves money when predefined types waste resources.
Autoscaling MIGs scale in/out based on CPU, HTTP load, or custom metrics. Scale-in reduces cost during low traffic.
Serverless pricing Cloud Functions, Cloud Run, and App Engine (standard) charge per invocation or per request-second with scale-to-zero -- no baseline cost when idle.
Cloud Storage classes Use Nearline (30-day minimum), Coldline (90-day), or Archive (365-day) for infrequently accessed data. Object Lifecycle Management automates transitions.
Right-sizing recommendations Cloud Monitoring + Recommender API identify oversized VMs and suggest smaller machine types.
BigQuery slots On-demand pricing (per-query, per TB scanned) vs. flat-rate reservations for predictable analytical workloads.

TCO Analysis and ROI

The exam tests whether you understand Total Cost of Ownership beyond just compute pricing:

  • Direct costs: Compute, storage, networking (egress), licensing
  • Indirect costs: Operational overhead, staffing, training, downtime
  • Migration costs: One-time costs for assessment, refactoring, data transfer, parallel running
  • Ongoing optimization: Cost of continuous monitoring, right-sizing, and re-architecting

The Google Cloud Pricing Calculator is the primary tool for estimating costs. Migration Center provides TCO comparisons between on-premises and cloud.

Compliance and Observability

For compliance-heavy scenarios:

  • VPC Service Controls: Create security perimeters around GCP resources to prevent data exfiltration
  • Organization Policy Service: Enforce constraints (e.g., restrict resource locations to specific regions)
  • Cloud Audit Logs: Admin Activity logs (always on), Data Access logs (configurable), System Event logs
  • Access Transparency: Logs of Google staff accessing your data (for compliance audits)
  • Assured Workloads: Preconfigured compliance environments for FedRAMP, HIPAA, CJIS, etc.

1.2 Defining Technical Requirements

High Availability Design Patterns

HA is a core PCA topic. You must understand the availability implications of different deployment topologies.

Pattern Availability Target Description Example
Single zone ~99.9% One zone, one region Dev/test environments
Multi-zone (within a region) ~99.99% Resources spread across 3 zones in one region Regional MIG behind a regional load balancer
Multi-region ~99.999% Resources in 2+ regions with global load balancing Global HTTP(S) LB with backend services in us-central1 and europe-west1
Hybrid/multi-cloud Varies Workloads across on-premises and GCP or multiple clouds Anthos, GKE Enterprise

Exam trap: Multi-zone is NOT the same as multi-region. A regional MIG spans multiple zones within a single region, giving you zone-level resilience. For region-level failures (extremely rare but testable), you need multi-region deployment with global load balancing.

Load Balancer Selection Guide

The PCA exam heavily tests load balancer selection. You must match the scenario to the correct load balancer type.

Load Balancer Layer Scope Traffic Type Key Feature
Global External Application LB 7 Global HTTP/HTTPS Anycast IP, Cloud CDN, Cloud Armor, URL maps, traffic splitting
Regional External Application LB 7 Regional HTTP/HTTPS Single region; Envoy-based; advanced traffic management
Regional Internal Application LB 7 Regional HTTP/HTTPS Internal clients only; microservices traffic routing
Cross-Region Internal Application LB 7 Global HTTP/HTTPS Internal clients across multiple regions
Global External Proxy Network LB 4 Global TCP with optional SSL offload Non-HTTP TCP traffic needing global reach (e.g., gaming, IoT)
Regional External Passthrough Network LB 4 Regional TCP/UDP/ESP/GRE/ICMP Preserves client source IP; direct server return; highest performance
Regional Internal Passthrough Network LB 4 Regional TCP/UDP Internal TCP/UDP load balancing (e.g., internal database tier)

Decision tree for the exam:

  1. Is it HTTP/HTTPS traffic? --> Application Load Balancer (Layer 7)
  2. Is it internal-only? --> Internal variant
  3. Does it need global reach? --> Global variant (requires Premium Network Tier)
  4. Is it non-HTTP TCP/UDP? --> Network Load Balancer (Layer 4)
  5. Do you need to preserve client source IP for non-HTTP? --> Passthrough Network LB
  6. Do you need SSL offload for non-HTTP TCP? --> Proxy Network LB

Exam trap: Cloud Armor DDoS protection and Cloud CDN integration are only available with the Global External Application Load Balancer. If a scenario requires WAF rules or DDoS protection, the answer is always this load balancer type. Internal load balancers do NOT support Cloud Armor.

Autoscaling

Managed Instance Groups (MIGs)

Managed Instance Groups are the primary compute scaling mechanism:

Feature Description
Autoscaling signals CPU utilization, HTTP load balancing serving capacity, Cloud Monitoring metrics, schedules
Cool-down period Time after instance creation before autoscaler collects metrics (prevents flapping)
Scale-in controls Limit how quickly the group can shrink (prevent aggressive scale-in)
Regional MIG Distributes instances across multiple zones for HA
Stateful MIG Preserves instance names, disks, and metadata across recreation events
Update policies Rolling update, canary update, proactive/opportunistic replacement

GKE Autoscaling

Autoscaler Scope Scales What Signal
Horizontal Pod Autoscaler (HPA) Pod Number of pod replicas CPU, memory, custom metrics, external metrics
Vertical Pod Autoscaler (VPA) Pod CPU/memory requests per pod Historical resource usage
Cluster Autoscaler Node Number of nodes in a node pool Pending pods (unschedulable due to insufficient resources)
Multidimensional Pod Autoscaler (MPA) Pod Both replicas and resources Combined HPA + VPA signals
Node Auto-Provisioning (NAP) Node pool Creates/deletes entire node pools Workload requirements (machine type, GPU, etc.)

Exam trap: HPA and VPA should not be used together on the same metric (e.g., both scaling on CPU). They will conflict. Use MPA if you need both horizontal and vertical scaling.

Serverless Services Comparison

Feature Cloud Functions Cloud Run App Engine Standard App Engine Flexible
Unit of deployment Function Container Application version Application version (custom runtime)
Scale to zero Yes Yes Yes No (minimum 1 instance)
Max request timeout 9 min (1st gen) / 60 min (2nd gen, HTTP-triggered only; 9 min event-driven) 60 min 10 min (auto) / 60 min (manual scaling) 60 min
Concurrency 1 (1st gen) / up to 1000 (2nd gen) Up to 1000 per instance Varies by runtime Configurable
Custom runtime No (specific runtimes) Yes (any container) No (supported runtimes) Yes (Dockerfile)
VPC access Serverless VPC Access connector Direct VPC egress or connector Connector Native VPC
Pricing Per invocation + compute time Per request + vCPU/memory-seconds Per instance-hour Per VM-hour
Best for Event-driven functions, webhooks Containerized web apps, APIs, microservices Simple web apps, APIs Legacy apps needing custom runtimes

Exam trap: Cloud Run is the recommended default serverless platform for new workloads. Cloud Functions is for event-driven glue code (Pub/Sub triggers, Cloud Storage triggers). App Engine Standard is legacy but still tested. App Engine Flexible does NOT scale to zero -- if cost optimization is the priority and the workload is bursty, App Engine Flexible is the wrong answer.

Google Cloud Well-Architected Framework

The Well-Architected Framework consists of six pillars. The PCA exam tests your understanding of design principles from each pillar.

Pillar Focus Key Principles
Operational Excellence Efficient deployment, monitoring, management CloudOps, incident management, automated change management, continuous improvement
Security, Privacy, and Compliance Data protection, zero trust, regulatory alignment Zero trust architecture, shift-left security, shared responsibility model, preemptive cyber defense
Reliability Resilient, highly available workloads Redundancy, horizontal scalability, graceful degradation, failure detection via observability, postmortems
Cost Optimization Maximize business value of cloud spending Align spend with business objectives, cost awareness culture, resource utilization optimization
Performance Optimization Optimal resource performance and tuning Elasticity, modular design, continuous monitoring, right-sizing
Sustainability Environmentally responsible workloads Low-carbon regions, energy-efficient software, optimized storage, resource usage patterns

Five foundational design principles across all pillars:

  1. Design for change -- Small, frequent deployments with rapid feedback loops
  2. Document your architecture -- Link documentation to design decisions
  3. Simplify and use managed services -- Reduce operational burden
  4. Decouple architecture -- Separate components for independent operation
  5. Use stateless architecture -- Improve scalability via shared storage and caching

1.3 Choosing GCP Network, Storage, and Compute Resources

Storage and Database Decision Tree

Choosing the right storage service is one of the most frequently tested areas. Use this decision matrix.

Relational Databases

Service Scale Model Max Capacity Global Distribution Use Case Key Differentiator
Cloud SQL Vertical (read replicas for read scale) 96 vCPUs, 624 GB RAM (Enterprise); 128 vCPUs, 864 GB RAM (Enterprise Plus) Cross-region read replicas Standard OLTP, web apps, CMS Managed MySQL, PostgreSQL, SQL Server; lowest operational overhead for relational
AlloyDB Vertical (read pool) Column-engine acceleration Cross-region replication PostgreSQL workloads needing high OLTP throughput and analytics PostgreSQL-compatible with columnar engine for accelerated analytical queries; high transaction throughput
Cloud Spanner Horizontal (automatic sharding) Virtually unlimited Multi-region with strong consistency Global OLTP, financial systems, inventory Only globally distributed relational database with external consistency; 99.999% SLA (multi-region)

Exam trap: If the scenario requires global strong consistency with a relational database, the answer is always Spanner. Cloud SQL cannot provide multi-region strong consistency. AlloyDB is PostgreSQL-compatible but does not offer Spanner's horizontal scaling or global distribution.

NoSQL Databases

Service Data Model Scale Model Consistency Use Case
Firestore Document (JSON-like) Serverless, automatic Strong (within entity group) Mobile/web apps, real-time sync, user profiles, content management
Bigtable Wide-column Horizontal (nodes) Eventually consistent (single-row strong) Time-series, IoT, analytics, AdTech, 10ms latency at scale
Memorystore Key-value (in-memory) Vertical Strong Session caching, leaderboards, real-time counters

Exam trap: Bigtable is NOT a good choice for data smaller than 1 TB -- the minimum node count makes it expensive at small scale. For small-to-medium NoSQL workloads, Firestore is the right answer. Bigtable excels at high-throughput, low-latency reads/writes at massive scale (petabytes).

Analytics and Data Warehousing

Service Type Best For Pricing Model
BigQuery Serverless data warehouse OLAP, BI, ad-hoc SQL analytics on petabyte-scale data On-demand (per TB scanned) or flat-rate reservations (slots)

BigQuery is the default answer for analytics workloads. Key features: federated queries (query data in Cloud Storage, Bigtable, or external sources without loading), materialized views, ML (BigQuery ML), streaming inserts, and partitioning/clustering for cost optimization.

Object and File Storage

Service Type Use Case Storage Classes
Cloud Storage Object store Unstructured data, backups, media, data lake Standard, Nearline (30-day min), Coldline (90-day min), Archive (365-day min)
Filestore Managed NFS Shared file systems for Compute Engine and GKE Basic HDD, Basic SSD, Zonal, Regional, Enterprise
Persistent Disk Block storage Boot disks, database disks Standard (HDD), Balanced (SSD), SSD, Extreme

Compute Decision Tree

Workload Recommended Service Why
Event-driven function, webhook Cloud Functions Single-purpose, auto-scales, event triggers
Containerized web app or API Cloud Run Any container, scale to zero, per-request pricing
Simple web app (supported runtimes) App Engine Standard Managed platform, auto-scaling, zero ops
Custom runtime web app App Engine Flexible or Cloud Run Dockerfile support; Cloud Run preferred for new workloads
Microservices with service mesh GKE Kubernetes orchestration, Istio/Anthos Service Mesh
Hybrid/multi-cloud Kubernetes GKE Enterprise (Anthos) Consistent Kubernetes management across GCP, on-prem, AWS, Azure; fleet management, Config Sync, Policy Controller
Lift-and-shift VM workloads Compute Engine Full VM control, any OS
High-performance computing Compute Engine (C2/C2D/H3) Compute-optimized machine types
ML training Vertex AI with GPUs/TPUs Managed ML platform, A2/A3 accelerator VMs, TPU pods
Large in-memory databases (SAP HANA) Compute Engine (M3) Memory-optimized, up to 12 TB RAM
Windows workloads or VMware migration Compute Engine or VMware Engine Windows licensing; VMware Engine for vSphere compatibility
Batch jobs tolerating interruption Compute Engine (Spot VMs) or Cloud Batch Up to 91% discount; Cloud Batch manages job queuing

Machine Family Quick Reference

Family Series Best For Key Spec
General-purpose E2, N1, N2, N2D, T2D, C3 Web servers, app servers, dev/test, small-medium databases E2: lowest cost; N2: best price-performance; C3: newest Intel
Compute-optimized C2, C2D, H3 HPC, gaming servers, batch processing, scientific computing H3: 88 vCPUs, DDR5, Intel Sapphire Rapids
Memory-optimized M1, M2, M3 SAP HANA, in-memory databases, real-time analytics M3: up to 12 TB RAM
Accelerator-optimized A2, A3, G2 ML training/inference, video transcoding, rendering A3: latest NVIDIA GPUs for LLM training

Networking Services

VPC Fundamentals

A VPC network is a global resource containing regional subnets. Key concepts:

Concept Description Exam Relevance
Auto-mode network Automatically creates one subnet per region with predefined IP ranges Quick setup; not recommended for production (IP ranges may conflict)
Custom-mode network You define subnets and IP ranges manually Production standard; full control over IP addressing
Firewall rules Distributed virtual firewall; default denies all ingress, allows all egress Implied rules; priority-based (0-65535, lower number = higher priority)
VPC Peering Connects two VPC networks; non-transitive No overlapping IP ranges; each peering is point-to-point
Shared VPC Centralizes networking in a host project; service projects attach to it Best practice for multi-project environments; centralizes firewall and subnet management
VPC Service Controls Creates security perimeters around GCP services Prevents data exfiltration; restricts API access to a perimeter
Private Google Access VMs without external IPs can reach Google APIs Must be enabled per subnet
Cloud NAT Outbound NAT for VMs without external IPs Regional; no inbound NAT

Exam trap: VPC Peering is NOT transitive. If VPC-A peers with VPC-B, and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B. For transitive connectivity, use a hub-and-spoke model with Network Connectivity Center or a Shared VPC. Network Connectivity Center is the answer when the exam describes multi-VPC or multi-cloud architectures needing transitive routing through a central hub.

Hybrid Connectivity

Service Bandwidth SLA Use Case Requirement
Cloud VPN (HA VPN) Up to 3 Gbps per tunnel 99.99% (with proper topology) Lower bandwidth, encrypted connectivity; initial migration IPsec; no colocation needed
Dedicated Interconnect 10 Gbps or 100 Gbps circuits (up to 200 Gbps) Google end-to-end SLA High-bandwidth, low-latency enterprise connectivity Requires colocation facility equipment
Partner Interconnect 50 Mbps to 50 Gbps Provider-dependent No colocation access; leveraging existing provider Service provider relationship
Cloud Router N/A (control plane) N/A Dynamic routing via BGP Used with Cloud VPN and Interconnect

Decision criteria:

  • Need < 3 Gbps and can tolerate internet-based latency? --> Cloud VPN
  • Need > 10 Gbps and have colocation access? --> Dedicated Interconnect
  • Need > 3 Gbps but no colocation? --> Partner Interconnect
  • Need encrypted connectivity over Interconnect? --> Use VPN over Interconnect (HA VPN tunnels over Dedicated/Partner Interconnect)

Exam trap: Cloud VPN traffic is encrypted (IPsec). Dedicated and Partner Interconnect traffic is NOT encrypted by default -- it traverses Google's network but is not IPsec-encrypted. If the scenario requires encryption over Interconnect, the answer is to layer HA VPN tunnels over the Interconnect connection.

Additional Networking Services

Service Purpose
Cloud DNS Managed authoritative DNS; supports public and private zones; DNSSEC support
Cloud Armor WAF and DDoS protection; works with external HTTP(S) load balancers and external proxy network load balancers
Cloud CDN Content caching at Google edge locations; works with Global External Application LB
Network Connectivity Center Hub-and-spoke network topology; provides transitive routing between VPCs, VPN tunnels, and Interconnect attachments
Traffic Director Managed control plane for service mesh (Envoy-based); global load balancing for internal microservices
Network Intelligence Center Network monitoring, topology visualization, connectivity tests

1.4 Designing a Migration Plan

Migration Phases

Google Cloud defines a four-phase migration framework:

Phase Activities Key Tools
Assess Inventory applications, map dependencies, evaluate TCO, assess team readiness Migration Center, StratoZone, manual discovery
Plan Design cloud foundation (landing zone), prioritize workloads, define migration waves Cloud Foundation Toolkit, Terraform, resource hierarchy design
Deploy Execute migration, transfer data, validate functionality Migrate to Virtual Machines, Database Migration Service, Storage Transfer Service
Optimize Right-size resources, enable autoscaling, adopt managed services, improve security posture Recommender, Cloud Monitoring, cost management tools

Migration Approaches

Approach Description When to Use Speed Optimization
Rehost (lift-and-shift) Move workloads as-is with minimal changes Tight timeline, legacy apps, risk-averse orgs Fastest Lowest (cloud benefits not leveraged)
Replatform (lift-and-optimize) Move and make targeted cloud optimizations Moderate timeline, desire for some cloud benefits Moderate Moderate
Refactor (move-and-improve) Modify application code to leverage cloud-native capabilities Budget and time available, performance improvements needed Slow High
Re-architect Fundamental restructure (e.g., monolith to microservices) Application needs major scalability or agility improvements Slowest Highest
Rebuild Complete rewrite as cloud-native Existing app unmaintainable or does not meet goals Slowest Highest
Repurchase Switch to SaaS equivalent On-premises software has a suitable SaaS replacement Varies N/A (different product)

Exam trap: The exam loves scenarios where a company says "we want to move to the cloud as fast as possible" -- the answer is almost always rehost first, then modernize later. Re-architecting is the right long-term play but never the fastest path. Conversely, if the scenario emphasizes "we want to take full advantage of cloud-native capabilities," rehosting is the wrong answer.

Migration Tools

Tool Migrates What Key Feature
Migrate to Virtual Machines Physical/virtual servers to Compute Engine Streaming replication, minimal downtime cutover
VMware Engine VMware workloads to a managed vSphere environment Full VMware stack (vCenter, vSAN, NSX-T) on Google Cloud; no application changes
Database Migration Service (DMS) MySQL, PostgreSQL, SQL Server, Oracle to Cloud SQL, AlloyDB, or as-is Continuous replication, minimal downtime
BigQuery Migration Service Data warehouses (Teradata, Redshift, etc.) to BigQuery SQL translation, schema migration
Storage Transfer Service Data from other clouds (S3, Azure Blob), HTTP sources, or on-premises to Cloud Storage Scheduled transfers, bandwidth control
Transfer Appliance Massive data volumes (hundreds of TB to 1 PB) to Cloud Storage Physical appliance shipped to your datacenter; for when network transfer is impractical
gsutil / gcloud storage Files and objects to Cloud Storage CLI-based; parallel uploads; resumable transfers
Migration Center Assessment and planning Discovery, dependency mapping, TCO analysis, fit assessment

Dependency and License Analysis

Before migration, you must understand:

  • Application dependencies: Which applications talk to which? Map with Migration Center or manual discovery.
  • Database dependencies: Which apps share databases? Foreign key relationships across schemas?
  • Network dependencies: What ports and protocols are required? Firewall rule translation?
  • License portability: Can existing licenses (Oracle, SQL Server, Windows) be brought to GCP? Do you need new cloud licenses? Sole-tenant nodes may be required for bring-your-own-license (BYOL) scenarios where license terms require physical host isolation.

Exam trap: Oracle Database licensing on Google Cloud often requires sole-tenant nodes because Oracle licenses per physical core, not per vCPU. If the exam describes an Oracle Database migration, sole-tenant nodes are likely part of the correct answer.


1.5 Planning for Future Improvements

Cloud Modernization Journey

The PCA exam tests your understanding of the modernization progression:

VMs (Compute Engine)
  --> Containers (GKE)
    --> Managed Containers (Cloud Run)
      --> Microservices + Service Mesh (GKE + Anthos Service Mesh)
        --> Event-Driven / Serverless (Cloud Functions + Pub/Sub + Eventarc)

Each step increases cloud-native optimization but also increases refactoring effort:

Stage Deployment Model Scaling Ops Overhead Cost Model
VMs Compute Engine MIGs Autoscaler (minutes) Highest (OS patching, etc.) Per-hour/second
Containers on GKE Kubernetes pods HPA/VPA/Cluster Autoscaler (seconds) Medium (cluster management) Per-node + overhead
Cloud Run Managed containers Request-based (seconds, to zero) Lowest Per-request
Serverless Functions Cloud Functions Invocation-based (milliseconds) Lowest Per-invocation

Integration with AI/ML via Vertex AI

Vertex AI is Google Cloud's unified ML platform. The PCA exam tests high-level architectural decisions:

Component Purpose When to Use
Vertex AI Workbench Managed Jupyter notebooks Data exploration, prototyping
Vertex AI Training Custom model training Custom ML models needing GPU/TPU
Vertex AI Prediction Online and batch prediction endpoints Serving trained models
AutoML No-code model training When data scientists are unavailable; tabular, image, text, video
Vertex AI Pipelines ML workflow orchestration (Kubeflow/TFX) Reproducible, automated ML pipelines
Gemini Cloud Assist AI-powered assistance for cloud operations Troubleshooting, code generation, architecture recommendations
Model Garden Pre-trained foundation models Using Google and open-source LLMs

Exam trap: The exam may present scenarios where a company wants ML capabilities but has no data science team. The answer is typically AutoML (no-code) or pre-trained APIs (Vision AI, Natural Language AI, Translation AI), not custom Vertex AI Training.

Data Mesh and BigQuery Federation

For data-heavy architectures, the PCA exam tests:

  • BigQuery federated queries: Query data in Cloud Storage (Parquet, ORC, Avro, CSV), Bigtable, or Cloud SQL without loading it into BigQuery. Trade-off: higher query latency but no ETL pipeline needed.
  • BigQuery Omni: Run BigQuery analytics on data stored in AWS S3 or Azure Blob Storage without moving it.
  • Analytics Hub: Share BigQuery datasets across organizations with governed access.
  • Dataplex: Data governance and management across data lakes and data warehouses; auto-discovery, metadata management, data quality.
  • Data mesh principles: Domain-oriented decentralized data ownership, data as a product, self-serve data infrastructure, federated computational governance.

Exam Strategy for Domain 1

Question Patterns

  1. "Which service should you use?" -- Map the requirements (scale, consistency, latency, cost) to the correct service using the decision trees above.
  2. "How should you minimize cost?" -- Apply the discount hierarchy: Spot VMs > CUDs > SUDs > right-sizing > serverless.
  3. "How should you design for high availability?" -- Match the availability requirement to multi-zone, multi-region, or hybrid.
  4. "What migration approach should you use?" -- Match the business constraint (speed, budget, team skill) to rehost/replatform/refactor.
  5. "Which load balancer?" -- Follow the decision tree: HTTP vs. TCP, internal vs. external, global vs. regional.

Common Exam Traps Summary

Trap Correct Answer
E2 instances get automatic SUDs No -- E2 is NOT eligible for SUDs
VPC peering is transitive No -- each peering is point-to-point; not transitive
Interconnect traffic is encrypted No -- you must layer HA VPN for encryption
App Engine Flexible scales to zero No -- minimum 1 instance; use Cloud Run for scale-to-zero
Bigtable for small datasets (< 1 TB) No -- Firestore for small-medium NoSQL; Bigtable for large-scale
Cloud SQL for global strong consistency No -- that is Cloud Spanner
Rehost is the most cloud-optimized approach No -- it is the fastest but least optimized; refactor/re-architect for full optimization
HPA and VPA can target the same metric No -- they conflict; use MPA for combined scaling
Cloud Armor works with internal load balancers No -- Cloud Armor works with external HTTP(S) LBs and external proxy network LBs only
Spot VMs have a 24-hour limit No -- that is preemptible VMs (legacy); Spot VMs have no time limit

References