Skip to content

Data Architecture

Version: 1.0 Updated: 2026-04-13

A map of where RTOpacks' data lives, who writes it, and who reads it. For database IDs and worker bindings, see docs/docs/workers/inventory.md.


Database layout

RTOpacks runs on multiple D1 databases in the Cloudflare OC region. Each has a clear role:

Database Role Primary writers Readers
rtopacks-db (rto-nrt-db) TGA corpus + CRICOS + enriched RTO detail — the NRT moat tga-sync, cricos-sync, tga-ingest, reg-intel, qual-enrichment Everybody (read-only from most workers)
ops-db Operational state — billing, sessions, observatory, sync cursors, regulatory events internal-api, admin, all sync workers internal-api, admin, workspace
engine-db-oc Workspace engine — tier resolution, user sessions, org memberships internal-api, workspace workspace, admin
calendar-db VET industry calendar events TBD site, workspace
licensing-db-oc NSW occupational licensing data TBD site, internal-api
microcredentials-db Micro-credentials catalog TBD site, internal-api
landscape-db Vendor landscape / market intelligence TBD admin
radar-db RTO radar / change detection reg-intel (indirect) admin
abs-db-oc ABS statistical data (LFS, WPI, JV, etc.) 3 ABS sync workers site, admin, stats-cache

Never confuse ops-db and rtopacks-db. Writing billing state to rtopacks-db or TGA rows to ops-db is a category error — the data will be in the wrong account, wrong backup schedule, wrong read path.

Standing rule: cd apps/admin/ before running wrangler d1 execute so the correct Cloudflare account (f95d453) is resolved. See docs/docs/ops/standing-rules.md.


Data moat: the NRT corpus (rtopacks-db)

The core competitive asset. Three logical layers:

Layer 1 — Raw TGA corpus (written by tga-sync)

  • tga_organisations — every ASQA-registered RTO with status, jurisdiction, dates
  • tga_training_components — quals, units, skill sets with supersession chain
  • tga_qualification_units — qual↔unit relationships
  • qualification_statuses, org_statuses — sweep-driven status deltas

Written weekly by tga-sync (Sunday 2am AEST) via queue-driven chain dispatch. Freshness metric: tga_organisations.synced_at.

Layer 2 — CRICOS (written by cricos-sync)

  • cricos_institutions (1,565)
  • cricos_courses (26,555)
  • cricos_locations (3,956)
  • cricos_course_locations (46,482)

Refreshed monthly (1st of month, 4am AEST) via the same queue pattern as tga-sync.

Layer 3 — Enriched RTO detail (written by tga-ingest)

  • rtos — core per-RTO row with enriched flag
  • rto_contacts, rto_addresses, rto_trading_names, rto_web_addresses, rto_legal_names, rto_registrations, rto_classifications

Written on-demand by tga-ingest when apps/site queues a message for a visited rto_code. Coverage: 72/12,515 = 0.58% as of 2026-04-13. BACKFILL-01 is the one-shot fix. See docs/docs/infrastructure/tga-ingest.md and docs/docs/briefs/backfill-01.md.


Operational layer (ops-db)

Everything that is "about what we do", not "about the VET sector".

  • Billing (10 tables) — billing_customers, billing_plans, billing_subscriptions, billing_invoices, billing_ledger, gst_accrual, billing_dunning, billing_plan_entitlements, billing_invoice_sequence, billing_purchases. Details: docs/docs/infrastructure/billing.md.
  • Client staterto_clients (admin client card surface, flipped by auto-convert)
  • Orgsorgs (workspace-scoped tenant metadata)
  • Observatorysync_runs, sync_steps, regulatory_events. Details: docs/docs/operations/observatory-guide.md.
  • Sync cursorstga_sync_cursor, cricos_sync_cursor (D1-backed phase pointers, NOT KV — see worker-patterns.md for why)

Workspace engine (engine-db-oc)

Separate from ops-db because workspace state has different access patterns and different failure tolerance. Auth sessions, tier resolution, org memberships, feature entitlements. Read by workspace + admin, written by internal-api.


KV namespaces

Binding Purpose
SESSION_KV Admin/workspace sessions, QB OAuth tokens, cron heartbeats

Critical rule: never put per-invocation write state in KV. Rapid writes return transient 500s that break chain dispatch. Use D1 instead. See docs/docs/infrastructure/worker-patterns.md.


Queues

Queue Producer Consumer Purpose
tga-sync-queue tga-sync self tga-sync Phase-chain dispatch (TGA weekly)
cricos-sync-queue cricos-sync self cricos-sync Phase-chain dispatch (CRICOS monthly)
rtopacks-ingest-queue apps/site tga-ingest On-demand RTO enrichment
rtopacks-ingest-dlq (system) (manual inspection) Dead-letter for ingest

Cross-database relationships

  • ops-db.rto_clients.rto_codertopacks-db.rtos.rto_code — soft FK across DBs. internal-api joins them by making two queries.
  • engine-db-oc.orgs.rto_codertopacks-db.rtos.rto_code — same pattern. The permanent org_rtopacks_ops ↔ rto_code 45329 mapping lives here.
  • ops-db.billing_customers.rto_codertopacks-db.rtos.rto_code — populated during /billing/subscribe so the auto-convert path knows which rto_code to flag.

No cross-DB foreign keys — D1 doesn't support them. These are application-level joins maintained by internal-api.


Production and dev surfaces

The surface map is binary. Production lives on rtopacks.com.au. Dev lives on rtopacks.dev. No third state.

Environment Zone What lives there Auth
Production rtopacks.com.au The user-facing topology: my. (workspace), admin. (admin console), internal-api. (backend), api. (public API), mcp. (MCP), media. (R2), preview. (kn-preview), trust. (Pages), docs. (decommissioned in STAGING-INFRA-01 Stage 6 → docs.rtopacks.dev). Apex + www. currently held by rtopacks-prelaunch; flipped to rtopacks-site at launch. CF Access on internal/admin and on staging.rtopacks.com.au (separate pre-launch apex artefact); internal-api has path-specific bypasses on /billing/webhook (Stripe) and /billing/qb-callback (QB OAuth)
Dev rtopacks.dev Mirror of the user-facing topology on subdomains: rtopacks.dev (apex → site), my. (workspace), admin. (admin), internal-api. (backend), docs. (internal docs). Workers wired in STAGING-INFRA-01 Stage 4. CF Access on entire zone via wildcard application; internal-api.rtopacks.dev has same path-specific bypasses on /billing/webhook and /billing/qb-callback

Why dev exists separately: workspace, admin, and site need a place to be exercised before they hit production. Stripe and QuickBooks are dev-grade integrations that haven't been run under repeated load — dev is where that happens. Sandbox credentials graduate to dev permanently; live credentials only ever go to production.

Database split:

Class Behaviour in dev
Read-only from user-facing workers (rto-nrt-db, rto-abs-db, rto-licensing-db, rto-radar-db) Shared with production — dev workers read directly from production DBs. Sync infrastructure lives on production only.
Mutable from user-facing workers (rto-ops-db, rto-workspace-db, rto-micro-db, rto-landscape-db, rto-calendar-db) Dev twin: <name>-staging (D1 names retain -staging suffix per ADR-027 D1-name lag; alignment via D1-NAME-ENVIRONMENT-RENAME-01). Schema replicated, seeded per per-database strategy.
SESSION_KV Dev twin: SESSION_KV_STAGING (KV name retains _STAGING suffix per same lag) — dev session validity must not transfer to production.
Cache KVs (SEARCH_CACHE, STATS_CACHE, etc.) Shared infrastructure with dev: key prefix. Dev writes prepend dev: to cache keys; dev reads only dev:-prefixed keys. Production never sees them. Shared KV, isolated data.

Standing rules:

  • Nothing lands on production until walked on dev.
  • Dev never sends real emails, SMS, live Stripe charges, or live QB pushes — sandbox events only.
  • Dev databases are seeded, never copied from production (except shared read-only databases).
  • Deploy order: dev first, walk, production second. Never production-first.
  • Sandbox credentials graduate to dev permanently. Live credentials only ever go to production.

See STAGING-INFRA-01 brief output for the full architecture and migration sequence (note: STAGING-INFRA-01 was the original infrastructure brief; vocabulary modernised to "dev" per ENVIRONMENT-NAME-RENAME-01).


  • docs/docs/workers/inventory.md — canonical DB + worker + queue IDs
  • docs/docs/infrastructure/worker-patterns.md — how sync writers move data, env-block deploy pattern, cache key prefix pattern
  • docs/docs/infrastructure/tga-ingest.md — on-demand enrichment
  • docs/docs/infrastructure/billing.md — billing schema, Stripe, QuickBooks, runbooks
  • docs/docs/architecture/database.md — SQL conventions, gotchas, the tga_sync_steps writer contract, table contracts
  • docs/docs/ops/standing-rules.md — operational rules for D1 / KV / exports