Vertex AI
Model Garden for hosted LLMs and embeddings. Vertex Predict for custom risk models. Vertex Eval for offline scoring.
- Gemini Pro
- Embeddings
- Eval
We pick GCP because Vertex AI, Gemini, and Cloud Run let us ship fast without giving up control over data residency, security, or auditability.
Each surface picked for a specific job. We avoid abstraction layers that hide what the model is actually doing.
Model Garden for hosted LLMs and embeddings. Vertex Predict for custom risk models. Vertex Eval for offline scoring.
Long-context reasoning, vision, and function-calling. Used for document understanding, narrative drafting, and agentic planning.
Stateless inference and APIs. Scale-to-zero, region-pinned to Toronto for Canadian data residency.
Form parsers and custom processors for KYC, customs, and trade documents across multiple languages.
Append-only audit log of model calls, decisions, and human handoffs. Long-term retention with column-level access control.
Async pipelines for ingestion, batch evals, and overnight re-scoring. Dead-letter queues with replay.
Case state, agent memory, and per-tenant isolation. Native multi-region replication for resilience.
Per-prompt traces, token counts, latency histograms, and SLO dashboards. Errors flow into PagerDuty for on-call rotations.
The shape of every MRA system. Modules slot into this base and inherit logging, residency, and identity by default.
Client / Partner ─────────────────────────────┐
│ HTTPS via Google-managed SSL │
▼ │
┌────────────────────────┐ │
│ Cloud Load Balancer │ │
│ (region: nam-ne2) │ │
└─────────┬──────────────┘ │
▼ │
┌────────────────────────┐ │
│ Cloud Run │ ◀── Workload ID ────┤
│ • API surfaces │ Federation │
│ • Agent orchestrators │ │
└─────────┬──────────────┘ │
▼ │
┌──── Reasoning ─────────────┐ │
│ Vertex AI · Gemini · Doc AI│ ─── Vertex │
│ Embeddings · Predict │ Eval │
└─────────┬──────────────────┘ │
▼ │
┌──── State + Evidence ──────┐ │
│ Firestore · BigQuery · GCS │ ─── CMEK + DLP │
└─────────┬──────────────────┘ │
▼ │
┌──── Observability ─────────┐ │
│ Cloud Logging · Monitoring │ ─── PagerDuty │
└────────────────────────────┘ │
│
VPC-SC perimeter around all data services ─────┘
Region: northamerica-northeast2 (Toronto, Canada)What we do by default. Anything more specific (HIPAA, ISO 27001, FedRAMP) is on request via Operate engagements.
GitHub Actions and CI deploy to GCP without long-lived service-account keys. Every action is attested and audit-logged.
CMEK on storage, BigQuery, and Pub/Sub. Tenant-scoped keys for high-sensitivity workloads.
Service perimeters around GCS, BigQuery, and Vertex AI prevent data exfiltration even from compromised credentials.
Inline PII redaction before logs and traces leave the data plane. Configurable per surface.
Production lives in northamerica-northeast2 (Toronto). Data plane services pinned to the region; no cross-region replication by default.
PIPEDA-aware data handling, SOC 2 Type II alignment on the roadmap, and audit-grade evidence shipped with every product.
A 45-minute architecture review with someone who has shipped this in production. No sales engineer in the loop.