Data Architecture¶
Version: 1.0 Updated: 2026-04-13
A map of where RTOpacks' data lives, who writes it, and who reads it. For database IDs and worker bindings, see docs/docs/workers/inventory.md.
Database layout¶
RTOpacks runs on multiple D1 databases in the Cloudflare OC region. Each has a clear role:
| Database | Role | Primary writers | Readers |
|---|---|---|---|
rtopacks-db (rto-nrt-db) |
TGA corpus + CRICOS + enriched RTO detail — the NRT moat | tga-sync, cricos-sync, tga-ingest, reg-intel, qual-enrichment |
Everybody (read-only from most workers) |
ops-db |
Operational state — billing, sessions, observatory, sync cursors, regulatory events | internal-api, admin, all sync workers |
internal-api, admin, workspace |
engine-db-oc |
Workspace engine — tier resolution, user sessions, org memberships | internal-api, workspace |
workspace, admin |
calendar-db |
VET industry calendar events | TBD | site, workspace |
licensing-db-oc |
NSW occupational licensing data | TBD | site, internal-api |
microcredentials-db |
Micro-credentials catalog | TBD | site, internal-api |
landscape-db |
Vendor landscape / market intelligence | TBD | admin |
radar-db |
RTO radar / change detection | reg-intel (indirect) |
admin |
abs-db-oc |
ABS statistical data (LFS, WPI, JV, etc.) | 3 ABS sync workers | site, admin, stats-cache |
Never confuse ops-db and rtopacks-db. Writing billing state to rtopacks-db or TGA rows to ops-db is a category error — the data will be in the wrong account, wrong backup schedule, wrong read path.
Standing rule: cd apps/admin/ before running wrangler d1 execute so the correct Cloudflare account (f95d453) is resolved. See docs/docs/ops/standing-rules.md.
Data moat: the NRT corpus (rtopacks-db)¶
The core competitive asset. Three logical layers:
Layer 1 — Raw TGA corpus (written by tga-sync)¶
tga_organisations— every ASQA-registered RTO with status, jurisdiction, datestga_training_components— quals, units, skill sets with supersession chaintga_qualification_units— qual↔unit relationshipsqualification_statuses,org_statuses— sweep-driven status deltas
Written weekly by tga-sync (Sunday 2am AEST) via queue-driven chain dispatch. Freshness metric: tga_organisations.synced_at.
Layer 2 — CRICOS (written by cricos-sync)¶
cricos_institutions(1,565)cricos_courses(26,555)cricos_locations(3,956)cricos_course_locations(46,482)
Refreshed monthly (1st of month, 4am AEST) via the same queue pattern as tga-sync.
Layer 3 — Enriched RTO detail (written by tga-ingest)¶
rtos— core per-RTO row withenrichedflagrto_contacts,rto_addresses,rto_trading_names,rto_web_addresses,rto_legal_names,rto_registrations,rto_classifications
Written on-demand by tga-ingest when apps/site queues a message for a visited rto_code. Coverage: 72/12,515 = 0.58% as of 2026-04-13. BACKFILL-01 is the one-shot fix. See docs/docs/infrastructure/tga-ingest.md and docs/docs/briefs/backfill-01.md.
Operational layer (ops-db)¶
Everything that is "about what we do", not "about the VET sector".
- Billing (10 tables) —
billing_customers,billing_plans,billing_subscriptions,billing_invoices,billing_ledger,gst_accrual,billing_dunning,billing_plan_entitlements,billing_invoice_sequence,billing_purchases. Details:docs/docs/infrastructure/billing.md. - Client state —
rto_clients(admin client card surface, flipped by auto-convert) - Orgs —
orgs(workspace-scoped tenant metadata) - Observatory —
sync_runs,sync_steps,regulatory_events. Details:docs/docs/operations/observatory-guide.md. - Sync cursors —
tga_sync_cursor,cricos_sync_cursor(D1-backed phase pointers, NOT KV — see worker-patterns.md for why)
Workspace engine (engine-db-oc)¶
Separate from ops-db because workspace state has different access patterns and different failure tolerance. Auth sessions, tier resolution, org memberships, feature entitlements. Read by workspace + admin, written by internal-api.
KV namespaces¶
| Binding | Purpose |
|---|---|
SESSION_KV |
Admin/workspace sessions, QB OAuth tokens, cron heartbeats |
Critical rule: never put per-invocation write state in KV. Rapid writes return transient 500s that break chain dispatch. Use D1 instead. See docs/docs/infrastructure/worker-patterns.md.
Queues¶
| Queue | Producer | Consumer | Purpose |
|---|---|---|---|
tga-sync-queue |
tga-sync self |
tga-sync |
Phase-chain dispatch (TGA weekly) |
cricos-sync-queue |
cricos-sync self |
cricos-sync |
Phase-chain dispatch (CRICOS monthly) |
rtopacks-ingest-queue |
apps/site |
tga-ingest |
On-demand RTO enrichment |
rtopacks-ingest-dlq |
(system) | (manual inspection) | Dead-letter for ingest |
Cross-database relationships¶
ops-db.rto_clients.rto_code↔rtopacks-db.rtos.rto_code— soft FK across DBs.internal-apijoins them by making two queries.engine-db-oc.orgs.rto_code↔rtopacks-db.rtos.rto_code— same pattern. The permanentorg_rtopacks_ops ↔ rto_code 45329mapping lives here.ops-db.billing_customers.rto_code↔rtopacks-db.rtos.rto_code— populated during/billing/subscribeso the auto-convert path knows which rto_code to flag.
No cross-DB foreign keys — D1 doesn't support them. These are application-level joins maintained by internal-api.
Production and dev surfaces¶
The surface map is binary. Production lives on rtopacks.com.au. Dev lives on rtopacks.dev. No third state.
| Environment | Zone | What lives there | Auth |
|---|---|---|---|
| Production | rtopacks.com.au |
The user-facing topology: my. (workspace), admin. (admin console), internal-api. (backend), api. (public API), mcp. (MCP), media. (R2), preview. (kn-preview), trust. (Pages), docs. (decommissioned in STAGING-INFRA-01 Stage 6 → docs.rtopacks.dev). Apex + www. currently held by rtopacks-prelaunch; flipped to rtopacks-site at launch. |
CF Access on internal/admin and on staging.rtopacks.com.au (separate pre-launch apex artefact); internal-api has path-specific bypasses on /billing/webhook (Stripe) and /billing/qb-callback (QB OAuth) |
| Dev | rtopacks.dev |
Mirror of the user-facing topology on subdomains: rtopacks.dev (apex → site), my. (workspace), admin. (admin), internal-api. (backend), docs. (internal docs). Workers wired in STAGING-INFRA-01 Stage 4. |
CF Access on entire zone via wildcard application; internal-api.rtopacks.dev has same path-specific bypasses on /billing/webhook and /billing/qb-callback |
Why dev exists separately: workspace, admin, and site need a place to be exercised before they hit production. Stripe and QuickBooks are dev-grade integrations that haven't been run under repeated load — dev is where that happens. Sandbox credentials graduate to dev permanently; live credentials only ever go to production.
Database split:
| Class | Behaviour in dev |
|---|---|
Read-only from user-facing workers (rto-nrt-db, rto-abs-db, rto-licensing-db, rto-radar-db) |
Shared with production — dev workers read directly from production DBs. Sync infrastructure lives on production only. |
Mutable from user-facing workers (rto-ops-db, rto-workspace-db, rto-micro-db, rto-landscape-db, rto-calendar-db) |
Dev twin: <name>-staging (D1 names retain -staging suffix per ADR-027 D1-name lag; alignment via D1-NAME-ENVIRONMENT-RENAME-01). Schema replicated, seeded per per-database strategy. |
SESSION_KV |
Dev twin: SESSION_KV_STAGING (KV name retains _STAGING suffix per same lag) — dev session validity must not transfer to production. |
Cache KVs (SEARCH_CACHE, STATS_CACHE, etc.) |
Shared infrastructure with dev: key prefix. Dev writes prepend dev: to cache keys; dev reads only dev:-prefixed keys. Production never sees them. Shared KV, isolated data. |
Standing rules:
- Nothing lands on production until walked on dev.
- Dev never sends real emails, SMS, live Stripe charges, or live QB pushes — sandbox events only.
- Dev databases are seeded, never copied from production (except shared read-only databases).
- Deploy order: dev first, walk, production second. Never production-first.
- Sandbox credentials graduate to dev permanently. Live credentials only ever go to production.
See STAGING-INFRA-01 brief output for the full architecture and migration sequence (note: STAGING-INFRA-01 was the original infrastructure brief; vocabulary modernised to "dev" per ENVIRONMENT-NAME-RENAME-01).
Related docs¶
docs/docs/workers/inventory.md— canonical DB + worker + queue IDsdocs/docs/infrastructure/worker-patterns.md— how sync writers move data, env-block deploy pattern, cache key prefix patterndocs/docs/infrastructure/tga-ingest.md— on-demand enrichmentdocs/docs/infrastructure/billing.md— billing schema, Stripe, QuickBooks, runbooksdocs/docs/architecture/database.md— SQL conventions, gotchas, thetga_sync_stepswriter contract, table contractsdocs/docs/ops/standing-rules.md— operational rules for D1 / KV / exports