Domain 1: Designing and Planning a Cloud Solution Architecture (~25%)
Domain 1 is the largest domain on the Professional Cloud Architect exam, accounting for roughly 25% of the questions (approximately 13-15 out of 50-60). This domain tests your ability to translate business requirements into technical cloud architectures, select the right GCP services for a given scenario, design migration plans, and plan for ongoing modernization. Unlike the ACE exam, which focuses on implementation, the PCA exam demands architectural decision-making -- you must justify why a particular service or pattern is the right choice, not just how to configure it.
1.1 Meeting Business Requirements and Strategy
The PCA exam presents scenario-based questions where a business describes its goals, and you must choose the architecture that best aligns with those goals. The exam tests your ability to balance cost, performance, reliability, compliance, and time-to-market.
Business Use Cases and Product Strategy
Exam scenarios typically fall into one of these patterns:
| Business Scenario | Architectural Direction |
|---|---|
| Startup with unpredictable traffic | Serverless (Cloud Run, Cloud Functions), autoscaling MIGs |
| Enterprise with strict compliance (HIPAA, PCI-DSS) | Dedicated resources, VPC Service Controls, Cloud HSM, regionalized data |
| Global consumer-facing application | Multi-region deployment, global load balancing, Cloud CDN |
| Cost-constrained batch processing | Spot VMs, preemptible VMs, Dataflow, Cloud Batch |
| Real-time analytics pipeline | Pub/Sub + Dataflow + BigQuery streaming |
| Legacy application migration | Rehost to Compute Engine or VMware Engine, then modernize |
Exam trap: The exam frequently presents scenarios where the "most technically elegant" solution is not the right answer. A startup that needs to launch in 2 weeks should rehost to Compute Engine, not spend months re-architecting for GKE. Always match the solution to the business constraint -- budget, timeline, team expertise, and compliance requirements.
Cost Optimization Strategies
Cost optimization is a dominant theme across the entire PCA exam. You must know the full discount hierarchy and when each applies.
Discount Types Comparison
| Discount Type | Savings | Commitment | Applies To | Key Detail |
|---|---|---|---|---|
| Sustained Use Discounts (SUDs) | Up to 30% | None (automatic) | N1 and sole-tenant nodes only | Applied automatically for usage above 25% of a billing month |
| Resource-based CUDs | Up to 55% (general) / 70% (memory-optimized) | 1 or 3 years | vCPUs, memory, GPUs, local SSD, sole-tenant nodes | Scoped to a specific region and project |
| Compute Flexible CUDs | Up to 46% (general) / 63% (memory-optimized) | 1 or 3 years | Compute Engine, GKE, Cloud Run | Applies across projects and regions within a billing account |
| Spot VMs | Up to 91% | None | Fault-tolerant workloads only | Can be preempted at any time; no SLA; no maximum runtime (unlike preemptible VMs) |
| Preemptible VMs (legacy) | Up to 91% | None | Fault-tolerant workloads only | 24-hour maximum runtime; Google recommends Spot VMs instead |
Discount priority order: CUDs take precedence over SUDs. A resource covered by a CUD does not also receive SUDs. Spot/preemptible VMs are not eligible for SUDs or CUDs.
Exam trap: SUDs apply automatically -- you do not need to purchase them. They are only available for N1 and sole-tenant nodes. N2, N2D, E2, C2, C3, T2D, and Tau machine families are NOT eligible for SUDs. If the exam asks about automatic discounts for an E2 or N2 instance, the answer is "none" -- use CUDs instead. E2 instances are eligible for Flexible CUDs.
Additional Cost Optimization Techniques
| Technique | Description |
|---|---|
| Custom machine types | Size vCPU and memory independently to avoid paying for unused resources. Available for N1, N2, N2D, E2 series. Carries a ~5% premium over predefined types but saves money when predefined types waste resources. |
| Autoscaling | MIGs scale in/out based on CPU, HTTP load, or custom metrics. Scale-in reduces cost during low traffic. |
| Serverless pricing | Cloud Functions, Cloud Run, and App Engine (standard) charge per invocation or per request-second with scale-to-zero -- no baseline cost when idle. |
| Cloud Storage classes | Use Nearline (30-day minimum), Coldline (90-day), or Archive (365-day) for infrequently accessed data. Object Lifecycle Management automates transitions. |
| Right-sizing recommendations | Cloud Monitoring + Recommender API identify oversized VMs and suggest smaller machine types. |
| BigQuery slots | On-demand pricing (per-query, per TB scanned) vs. flat-rate reservations for predictable analytical workloads. |
TCO Analysis and ROI
The exam tests whether you understand Total Cost of Ownership beyond just compute pricing:
- Direct costs: Compute, storage, networking (egress), licensing
- Indirect costs: Operational overhead, staffing, training, downtime
- Migration costs: One-time costs for assessment, refactoring, data transfer, parallel running
- Ongoing optimization: Cost of continuous monitoring, right-sizing, and re-architecting
The Google Cloud Pricing Calculator is the primary tool for estimating costs. Migration Center provides TCO comparisons between on-premises and cloud.
Compliance and Observability
For compliance-heavy scenarios:
- VPC Service Controls: Create security perimeters around GCP resources to prevent data exfiltration
- Organization Policy Service: Enforce constraints (e.g., restrict resource locations to specific regions)
- Cloud Audit Logs: Admin Activity logs (always on), Data Access logs (configurable), System Event logs
- Access Transparency: Logs of Google staff accessing your data (for compliance audits)
- Assured Workloads: Preconfigured compliance environments for FedRAMP, HIPAA, CJIS, etc.
1.2 Defining Technical Requirements
High Availability Design Patterns
HA is a core PCA topic. You must understand the availability implications of different deployment topologies.
| Pattern | Availability Target | Description | Example |
|---|---|---|---|
| Single zone | ~99.9% | One zone, one region | Dev/test environments |
| Multi-zone (within a region) | ~99.99% | Resources spread across 3 zones in one region | Regional MIG behind a regional load balancer |
| Multi-region | ~99.999% | Resources in 2+ regions with global load balancing | Global HTTP(S) LB with backend services in us-central1 and europe-west1 |
| Hybrid/multi-cloud | Varies | Workloads across on-premises and GCP or multiple clouds | Anthos, GKE Enterprise |
Exam trap: Multi-zone is NOT the same as multi-region. A regional MIG spans multiple zones within a single region, giving you zone-level resilience. For region-level failures (extremely rare but testable), you need multi-region deployment with global load balancing.
Load Balancer Selection Guide
The PCA exam heavily tests load balancer selection. You must match the scenario to the correct load balancer type.
| Load Balancer | Layer | Scope | Traffic Type | Key Feature |
|---|---|---|---|---|
| Global External Application LB | 7 | Global | HTTP/HTTPS | Anycast IP, Cloud CDN, Cloud Armor, URL maps, traffic splitting |
| Regional External Application LB | 7 | Regional | HTTP/HTTPS | Single region; Envoy-based; advanced traffic management |
| Regional Internal Application LB | 7 | Regional | HTTP/HTTPS | Internal clients only; microservices traffic routing |
| Cross-Region Internal Application LB | 7 | Global | HTTP/HTTPS | Internal clients across multiple regions |
| Global External Proxy Network LB | 4 | Global | TCP with optional SSL offload | Non-HTTP TCP traffic needing global reach (e.g., gaming, IoT) |
| Regional External Passthrough Network LB | 4 | Regional | TCP/UDP/ESP/GRE/ICMP | Preserves client source IP; direct server return; highest performance |
| Regional Internal Passthrough Network LB | 4 | Regional | TCP/UDP | Internal TCP/UDP load balancing (e.g., internal database tier) |
Decision tree for the exam:
- Is it HTTP/HTTPS traffic? --> Application Load Balancer (Layer 7)
- Is it internal-only? --> Internal variant
- Does it need global reach? --> Global variant (requires Premium Network Tier)
- Is it non-HTTP TCP/UDP? --> Network Load Balancer (Layer 4)
- Do you need to preserve client source IP for non-HTTP? --> Passthrough Network LB
- Do you need SSL offload for non-HTTP TCP? --> Proxy Network LB
Exam trap: Cloud Armor DDoS protection and Cloud CDN integration are only available with the Global External Application Load Balancer. If a scenario requires WAF rules or DDoS protection, the answer is always this load balancer type. Internal load balancers do NOT support Cloud Armor.
Autoscaling
Managed Instance Groups (MIGs)
Managed Instance Groups are the primary compute scaling mechanism:
| Feature | Description |
|---|---|
| Autoscaling signals | CPU utilization, HTTP load balancing serving capacity, Cloud Monitoring metrics, schedules |
| Cool-down period | Time after instance creation before autoscaler collects metrics (prevents flapping) |
| Scale-in controls | Limit how quickly the group can shrink (prevent aggressive scale-in) |
| Regional MIG | Distributes instances across multiple zones for HA |
| Stateful MIG | Preserves instance names, disks, and metadata across recreation events |
| Update policies | Rolling update, canary update, proactive/opportunistic replacement |
GKE Autoscaling
| Autoscaler | Scope | Scales What | Signal |
|---|---|---|---|
| Horizontal Pod Autoscaler (HPA) | Pod | Number of pod replicas | CPU, memory, custom metrics, external metrics |
| Vertical Pod Autoscaler (VPA) | Pod | CPU/memory requests per pod | Historical resource usage |
| Cluster Autoscaler | Node | Number of nodes in a node pool | Pending pods (unschedulable due to insufficient resources) |
| Multidimensional Pod Autoscaler (MPA) | Pod | Both replicas and resources | Combined HPA + VPA signals |
| Node Auto-Provisioning (NAP) | Node pool | Creates/deletes entire node pools | Workload requirements (machine type, GPU, etc.) |
Exam trap: HPA and VPA should not be used together on the same metric (e.g., both scaling on CPU). They will conflict. Use MPA if you need both horizontal and vertical scaling.
Serverless Services Comparison
| Feature | Cloud Functions | Cloud Run | App Engine Standard | App Engine Flexible |
|---|---|---|---|---|
| Unit of deployment | Function | Container | Application version | Application version (custom runtime) |
| Scale to zero | Yes | Yes | Yes | No (minimum 1 instance) |
| Max request timeout | 9 min (1st gen) / 60 min (2nd gen, HTTP-triggered only; 9 min event-driven) | 60 min | 10 min (auto) / 60 min (manual scaling) | 60 min |
| Concurrency | 1 (1st gen) / up to 1000 (2nd gen) | Up to 1000 per instance | Varies by runtime | Configurable |
| Custom runtime | No (specific runtimes) | Yes (any container) | No (supported runtimes) | Yes (Dockerfile) |
| VPC access | Serverless VPC Access connector | Direct VPC egress or connector | Connector | Native VPC |
| Pricing | Per invocation + compute time | Per request + vCPU/memory-seconds | Per instance-hour | Per VM-hour |
| Best for | Event-driven functions, webhooks | Containerized web apps, APIs, microservices | Simple web apps, APIs | Legacy apps needing custom runtimes |
Exam trap: Cloud Run is the recommended default serverless platform for new workloads. Cloud Functions is for event-driven glue code (Pub/Sub triggers, Cloud Storage triggers). App Engine Standard is legacy but still tested. App Engine Flexible does NOT scale to zero -- if cost optimization is the priority and the workload is bursty, App Engine Flexible is the wrong answer.
Google Cloud Well-Architected Framework
The Well-Architected Framework consists of six pillars. The PCA exam tests your understanding of design principles from each pillar.
| Pillar | Focus | Key Principles |
|---|---|---|
| Operational Excellence | Efficient deployment, monitoring, management | CloudOps, incident management, automated change management, continuous improvement |
| Security, Privacy, and Compliance | Data protection, zero trust, regulatory alignment | Zero trust architecture, shift-left security, shared responsibility model, preemptive cyber defense |
| Reliability | Resilient, highly available workloads | Redundancy, horizontal scalability, graceful degradation, failure detection via observability, postmortems |
| Cost Optimization | Maximize business value of cloud spending | Align spend with business objectives, cost awareness culture, resource utilization optimization |
| Performance Optimization | Optimal resource performance and tuning | Elasticity, modular design, continuous monitoring, right-sizing |
| Sustainability | Environmentally responsible workloads | Low-carbon regions, energy-efficient software, optimized storage, resource usage patterns |
Five foundational design principles across all pillars:
- Design for change -- Small, frequent deployments with rapid feedback loops
- Document your architecture -- Link documentation to design decisions
- Simplify and use managed services -- Reduce operational burden
- Decouple architecture -- Separate components for independent operation
- Use stateless architecture -- Improve scalability via shared storage and caching
1.3 Choosing GCP Network, Storage, and Compute Resources
Storage and Database Decision Tree
Choosing the right storage service is one of the most frequently tested areas. Use this decision matrix.
Relational Databases
| Service | Scale Model | Max Capacity | Global Distribution | Use Case | Key Differentiator |
|---|---|---|---|---|---|
| Cloud SQL | Vertical (read replicas for read scale) | 96 vCPUs, 624 GB RAM (Enterprise); 128 vCPUs, 864 GB RAM (Enterprise Plus) | Cross-region read replicas | Standard OLTP, web apps, CMS | Managed MySQL, PostgreSQL, SQL Server; lowest operational overhead for relational |
| AlloyDB | Vertical (read pool) | Column-engine acceleration | Cross-region replication | PostgreSQL workloads needing high OLTP throughput and analytics | PostgreSQL-compatible with columnar engine for accelerated analytical queries; high transaction throughput |
| Cloud Spanner | Horizontal (automatic sharding) | Virtually unlimited | Multi-region with strong consistency | Global OLTP, financial systems, inventory | Only globally distributed relational database with external consistency; 99.999% SLA (multi-region) |
Exam trap: If the scenario requires global strong consistency with a relational database, the answer is always Spanner. Cloud SQL cannot provide multi-region strong consistency. AlloyDB is PostgreSQL-compatible but does not offer Spanner's horizontal scaling or global distribution.
NoSQL Databases
| Service | Data Model | Scale Model | Consistency | Use Case |
|---|---|---|---|---|
| Firestore | Document (JSON-like) | Serverless, automatic | Strong (within entity group) | Mobile/web apps, real-time sync, user profiles, content management |
| Bigtable | Wide-column | Horizontal (nodes) | Eventually consistent (single-row strong) | Time-series, IoT, analytics, AdTech, 10ms latency at scale |
| Memorystore | Key-value (in-memory) | Vertical | Strong | Session caching, leaderboards, real-time counters |
Exam trap: Bigtable is NOT a good choice for data smaller than 1 TB -- the minimum node count makes it expensive at small scale. For small-to-medium NoSQL workloads, Firestore is the right answer. Bigtable excels at high-throughput, low-latency reads/writes at massive scale (petabytes).
Analytics and Data Warehousing
| Service | Type | Best For | Pricing Model |
|---|---|---|---|
| BigQuery | Serverless data warehouse | OLAP, BI, ad-hoc SQL analytics on petabyte-scale data | On-demand (per TB scanned) or flat-rate reservations (slots) |
BigQuery is the default answer for analytics workloads. Key features: federated queries (query data in Cloud Storage, Bigtable, or external sources without loading), materialized views, ML (BigQuery ML), streaming inserts, and partitioning/clustering for cost optimization.
Object and File Storage
| Service | Type | Use Case | Storage Classes |
|---|---|---|---|
| Cloud Storage | Object store | Unstructured data, backups, media, data lake | Standard, Nearline (30-day min), Coldline (90-day min), Archive (365-day min) |
| Filestore | Managed NFS | Shared file systems for Compute Engine and GKE | Basic HDD, Basic SSD, Zonal, Regional, Enterprise |
| Persistent Disk | Block storage | Boot disks, database disks | Standard (HDD), Balanced (SSD), SSD, Extreme |
Compute Decision Tree
| Workload | Recommended Service | Why |
|---|---|---|
| Event-driven function, webhook | Cloud Functions | Single-purpose, auto-scales, event triggers |
| Containerized web app or API | Cloud Run | Any container, scale to zero, per-request pricing |
| Simple web app (supported runtimes) | App Engine Standard | Managed platform, auto-scaling, zero ops |
| Custom runtime web app | App Engine Flexible or Cloud Run | Dockerfile support; Cloud Run preferred for new workloads |
| Microservices with service mesh | GKE | Kubernetes orchestration, Istio/Anthos Service Mesh |
| Hybrid/multi-cloud Kubernetes | GKE Enterprise (Anthos) | Consistent Kubernetes management across GCP, on-prem, AWS, Azure; fleet management, Config Sync, Policy Controller |
| Lift-and-shift VM workloads | Compute Engine | Full VM control, any OS |
| High-performance computing | Compute Engine (C2/C2D/H3) | Compute-optimized machine types |
| ML training | Vertex AI with GPUs/TPUs | Managed ML platform, A2/A3 accelerator VMs, TPU pods |
| Large in-memory databases (SAP HANA) | Compute Engine (M3) | Memory-optimized, up to 12 TB RAM |
| Windows workloads or VMware migration | Compute Engine or VMware Engine | Windows licensing; VMware Engine for vSphere compatibility |
| Batch jobs tolerating interruption | Compute Engine (Spot VMs) or Cloud Batch | Up to 91% discount; Cloud Batch manages job queuing |
Machine Family Quick Reference
| Family | Series | Best For | Key Spec |
|---|---|---|---|
| General-purpose | E2, N1, N2, N2D, T2D, C3 | Web servers, app servers, dev/test, small-medium databases | E2: lowest cost; N2: best price-performance; C3: newest Intel |
| Compute-optimized | C2, C2D, H3 | HPC, gaming servers, batch processing, scientific computing | H3: 88 vCPUs, DDR5, Intel Sapphire Rapids |
| Memory-optimized | M1, M2, M3 | SAP HANA, in-memory databases, real-time analytics | M3: up to 12 TB RAM |
| Accelerator-optimized | A2, A3, G2 | ML training/inference, video transcoding, rendering | A3: latest NVIDIA GPUs for LLM training |
Networking Services
VPC Fundamentals
A VPC network is a global resource containing regional subnets. Key concepts:
| Concept | Description | Exam Relevance |
|---|---|---|
| Auto-mode network | Automatically creates one subnet per region with predefined IP ranges | Quick setup; not recommended for production (IP ranges may conflict) |
| Custom-mode network | You define subnets and IP ranges manually | Production standard; full control over IP addressing |
| Firewall rules | Distributed virtual firewall; default denies all ingress, allows all egress | Implied rules; priority-based (0-65535, lower number = higher priority) |
| VPC Peering | Connects two VPC networks; non-transitive | No overlapping IP ranges; each peering is point-to-point |
| Shared VPC | Centralizes networking in a host project; service projects attach to it | Best practice for multi-project environments; centralizes firewall and subnet management |
| VPC Service Controls | Creates security perimeters around GCP services | Prevents data exfiltration; restricts API access to a perimeter |
| Private Google Access | VMs without external IPs can reach Google APIs | Must be enabled per subnet |
| Cloud NAT | Outbound NAT for VMs without external IPs | Regional; no inbound NAT |
Exam trap: VPC Peering is NOT transitive. If VPC-A peers with VPC-B, and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B. For transitive connectivity, use a hub-and-spoke model with Network Connectivity Center or a Shared VPC. Network Connectivity Center is the answer when the exam describes multi-VPC or multi-cloud architectures needing transitive routing through a central hub.
Hybrid Connectivity
| Service | Bandwidth | SLA | Use Case | Requirement |
|---|---|---|---|---|
| Cloud VPN (HA VPN) | Up to 3 Gbps per tunnel | 99.99% (with proper topology) | Lower bandwidth, encrypted connectivity; initial migration | IPsec; no colocation needed |
| Dedicated Interconnect | 10 Gbps or 100 Gbps circuits (up to 200 Gbps) | Google end-to-end SLA | High-bandwidth, low-latency enterprise connectivity | Requires colocation facility equipment |
| Partner Interconnect | 50 Mbps to 50 Gbps | Provider-dependent | No colocation access; leveraging existing provider | Service provider relationship |
| Cloud Router | N/A (control plane) | N/A | Dynamic routing via BGP | Used with Cloud VPN and Interconnect |
Decision criteria:
- Need < 3 Gbps and can tolerate internet-based latency? --> Cloud VPN
- Need > 10 Gbps and have colocation access? --> Dedicated Interconnect
- Need > 3 Gbps but no colocation? --> Partner Interconnect
- Need encrypted connectivity over Interconnect? --> Use VPN over Interconnect (HA VPN tunnels over Dedicated/Partner Interconnect)
Exam trap: Cloud VPN traffic is encrypted (IPsec). Dedicated and Partner Interconnect traffic is NOT encrypted by default -- it traverses Google's network but is not IPsec-encrypted. If the scenario requires encryption over Interconnect, the answer is to layer HA VPN tunnels over the Interconnect connection.
Additional Networking Services
| Service | Purpose |
|---|---|
| Cloud DNS | Managed authoritative DNS; supports public and private zones; DNSSEC support |
| Cloud Armor | WAF and DDoS protection; works with external HTTP(S) load balancers and external proxy network load balancers |
| Cloud CDN | Content caching at Google edge locations; works with Global External Application LB |
| Network Connectivity Center | Hub-and-spoke network topology; provides transitive routing between VPCs, VPN tunnels, and Interconnect attachments |
| Traffic Director | Managed control plane for service mesh (Envoy-based); global load balancing for internal microservices |
| Network Intelligence Center | Network monitoring, topology visualization, connectivity tests |
1.4 Designing a Migration Plan
Migration Phases
Google Cloud defines a four-phase migration framework:
| Phase | Activities | Key Tools |
|---|---|---|
| Assess | Inventory applications, map dependencies, evaluate TCO, assess team readiness | Migration Center, StratoZone, manual discovery |
| Plan | Design cloud foundation (landing zone), prioritize workloads, define migration waves | Cloud Foundation Toolkit, Terraform, resource hierarchy design |
| Deploy | Execute migration, transfer data, validate functionality | Migrate to Virtual Machines, Database Migration Service, Storage Transfer Service |
| Optimize | Right-size resources, enable autoscaling, adopt managed services, improve security posture | Recommender, Cloud Monitoring, cost management tools |
Migration Approaches
| Approach | Description | When to Use | Speed | Optimization |
|---|---|---|---|---|
| Rehost (lift-and-shift) | Move workloads as-is with minimal changes | Tight timeline, legacy apps, risk-averse orgs | Fastest | Lowest (cloud benefits not leveraged) |
| Replatform (lift-and-optimize) | Move and make targeted cloud optimizations | Moderate timeline, desire for some cloud benefits | Moderate | Moderate |
| Refactor (move-and-improve) | Modify application code to leverage cloud-native capabilities | Budget and time available, performance improvements needed | Slow | High |
| Re-architect | Fundamental restructure (e.g., monolith to microservices) | Application needs major scalability or agility improvements | Slowest | Highest |
| Rebuild | Complete rewrite as cloud-native | Existing app unmaintainable or does not meet goals | Slowest | Highest |
| Repurchase | Switch to SaaS equivalent | On-premises software has a suitable SaaS replacement | Varies | N/A (different product) |
Exam trap: The exam loves scenarios where a company says "we want to move to the cloud as fast as possible" -- the answer is almost always rehost first, then modernize later. Re-architecting is the right long-term play but never the fastest path. Conversely, if the scenario emphasizes "we want to take full advantage of cloud-native capabilities," rehosting is the wrong answer.
Migration Tools
| Tool | Migrates What | Key Feature |
|---|---|---|
| Migrate to Virtual Machines | Physical/virtual servers to Compute Engine | Streaming replication, minimal downtime cutover |
| VMware Engine | VMware workloads to a managed vSphere environment | Full VMware stack (vCenter, vSAN, NSX-T) on Google Cloud; no application changes |
| Database Migration Service (DMS) | MySQL, PostgreSQL, SQL Server, Oracle to Cloud SQL, AlloyDB, or as-is | Continuous replication, minimal downtime |
| BigQuery Migration Service | Data warehouses (Teradata, Redshift, etc.) to BigQuery | SQL translation, schema migration |
| Storage Transfer Service | Data from other clouds (S3, Azure Blob), HTTP sources, or on-premises to Cloud Storage | Scheduled transfers, bandwidth control |
| Transfer Appliance | Massive data volumes (hundreds of TB to 1 PB) to Cloud Storage | Physical appliance shipped to your datacenter; for when network transfer is impractical |
| gsutil / gcloud storage | Files and objects to Cloud Storage | CLI-based; parallel uploads; resumable transfers |
| Migration Center | Assessment and planning | Discovery, dependency mapping, TCO analysis, fit assessment |
Dependency and License Analysis
Before migration, you must understand:
- Application dependencies: Which applications talk to which? Map with Migration Center or manual discovery.
- Database dependencies: Which apps share databases? Foreign key relationships across schemas?
- Network dependencies: What ports and protocols are required? Firewall rule translation?
- License portability: Can existing licenses (Oracle, SQL Server, Windows) be brought to GCP? Do you need new cloud licenses? Sole-tenant nodes may be required for bring-your-own-license (BYOL) scenarios where license terms require physical host isolation.
Exam trap: Oracle Database licensing on Google Cloud often requires sole-tenant nodes because Oracle licenses per physical core, not per vCPU. If the exam describes an Oracle Database migration, sole-tenant nodes are likely part of the correct answer.
1.5 Planning for Future Improvements
Cloud Modernization Journey
The PCA exam tests your understanding of the modernization progression:
VMs (Compute Engine)
--> Containers (GKE)
--> Managed Containers (Cloud Run)
--> Microservices + Service Mesh (GKE + Anthos Service Mesh)
--> Event-Driven / Serverless (Cloud Functions + Pub/Sub + Eventarc)
Each step increases cloud-native optimization but also increases refactoring effort:
| Stage | Deployment Model | Scaling | Ops Overhead | Cost Model |
|---|---|---|---|---|
| VMs | Compute Engine MIGs | Autoscaler (minutes) | Highest (OS patching, etc.) | Per-hour/second |
| Containers on GKE | Kubernetes pods | HPA/VPA/Cluster Autoscaler (seconds) | Medium (cluster management) | Per-node + overhead |
| Cloud Run | Managed containers | Request-based (seconds, to zero) | Lowest | Per-request |
| Serverless Functions | Cloud Functions | Invocation-based (milliseconds) | Lowest | Per-invocation |
Integration with AI/ML via Vertex AI
Vertex AI is Google Cloud's unified ML platform. The PCA exam tests high-level architectural decisions:
| Component | Purpose | When to Use |
|---|---|---|
| Vertex AI Workbench | Managed Jupyter notebooks | Data exploration, prototyping |
| Vertex AI Training | Custom model training | Custom ML models needing GPU/TPU |
| Vertex AI Prediction | Online and batch prediction endpoints | Serving trained models |
| AutoML | No-code model training | When data scientists are unavailable; tabular, image, text, video |
| Vertex AI Pipelines | ML workflow orchestration (Kubeflow/TFX) | Reproducible, automated ML pipelines |
| Gemini Cloud Assist | AI-powered assistance for cloud operations | Troubleshooting, code generation, architecture recommendations |
| Model Garden | Pre-trained foundation models | Using Google and open-source LLMs |
Exam trap: The exam may present scenarios where a company wants ML capabilities but has no data science team. The answer is typically AutoML (no-code) or pre-trained APIs (Vision AI, Natural Language AI, Translation AI), not custom Vertex AI Training.
Data Mesh and BigQuery Federation
For data-heavy architectures, the PCA exam tests:
- BigQuery federated queries: Query data in Cloud Storage (Parquet, ORC, Avro, CSV), Bigtable, or Cloud SQL without loading it into BigQuery. Trade-off: higher query latency but no ETL pipeline needed.
- BigQuery Omni: Run BigQuery analytics on data stored in AWS S3 or Azure Blob Storage without moving it.
- Analytics Hub: Share BigQuery datasets across organizations with governed access.
- Dataplex: Data governance and management across data lakes and data warehouses; auto-discovery, metadata management, data quality.
- Data mesh principles: Domain-oriented decentralized data ownership, data as a product, self-serve data infrastructure, federated computational governance.
Exam Strategy for Domain 1
Question Patterns
- "Which service should you use?" -- Map the requirements (scale, consistency, latency, cost) to the correct service using the decision trees above.
- "How should you minimize cost?" -- Apply the discount hierarchy: Spot VMs > CUDs > SUDs > right-sizing > serverless.
- "How should you design for high availability?" -- Match the availability requirement to multi-zone, multi-region, or hybrid.
- "What migration approach should you use?" -- Match the business constraint (speed, budget, team skill) to rehost/replatform/refactor.
- "Which load balancer?" -- Follow the decision tree: HTTP vs. TCP, internal vs. external, global vs. regional.
Common Exam Traps Summary
| Trap | Correct Answer |
|---|---|
| E2 instances get automatic SUDs | No -- E2 is NOT eligible for SUDs |
| VPC peering is transitive | No -- each peering is point-to-point; not transitive |
| Interconnect traffic is encrypted | No -- you must layer HA VPN for encryption |
| App Engine Flexible scales to zero | No -- minimum 1 instance; use Cloud Run for scale-to-zero |
| Bigtable for small datasets (< 1 TB) | No -- Firestore for small-medium NoSQL; Bigtable for large-scale |
| Cloud SQL for global strong consistency | No -- that is Cloud Spanner |
| Rehost is the most cloud-optimized approach | No -- it is the fastest but least optimized; refactor/re-architect for full optimization |
| HPA and VPA can target the same metric | No -- they conflict; use MPA for combined scaling |
| Cloud Armor works with internal load balancers | No -- Cloud Armor works with external HTTP(S) LBs and external proxy network LBs only |
| Spot VMs have a 24-hour limit | No -- that is preemptible VMs (legacy); Spot VMs have no time limit |
References
- Google Cloud Well-Architected Framework
- Cloud Load Balancing Overview
- Migration to Google Cloud: Getting Started
- Preemptible and Spot VMs
- Sustained Use Discounts
- Committed Use Discounts Overview
- VPC Overview
- Choosing a Network Connectivity Product
- Compute Engine Machine Families
- Google Cloud Database Products
- Migration Center
- Google Cloud Pricing Calculator
- Vertex AI Documentation
- BigQuery Documentation
- GKE Autoscaling
- Cloud Run Documentation
- Cloud Functions Documentation
- VMware Engine