Skip to content

RTOpacks Architecture Decisions

Status: Canonical log of architectural commitments. Governance level: Position 4 in the standing-rules.md governance hierarchy. Below the Client Spine (client-spine.md), standing rules, and WS-PRODUCT-01. Above briefs and module specs. Format: Architecture Decision Records (ADR). Each entry is numbered, dated, and self-contained. Discipline: Load-bearing architectural decisions are written here at the moment of making them. Decisions that live only in chat or in briefs are not canonical and may not be relied upon. Last updated: 2026-05-30 (initial 13 ADRs filed 2026-05-26 morning; ADRs 014-018 appended 2026-05-26 evening; ADR-006 refined to clarify multi-provider posture; ADR-007 refined per substrate-state finding; ADR-018 refined three times — prefix discipline + vocabulary + env marker, then three-category taxonomy, then Q1b environment-neutrality clause; ADR-019 added for Mandarin enforcement at credential-surface layer; ADR-018 minor refinement per CANON-VS-ADR-018-RECONCILIATION-01 — worked-examples sharpened to use quickbooks consistently and rename-feasibility phase sequencing refined to reflect verified per-resource-type cost; ADRs 020-023 added 2026-05-27 — access-control three-tier model (T3/T4/T4A), plan/entitlement/metering substrate, operator support via impersonation pattern, graceful degradation across subscription state; ADR-007, ADR-008, ADR-012 each refined with composition notes mapping their existing content to the new tier model; ADR-024 added 2026-05-27 per IDENTITY-MODEL-RATIONALISATION-01 Phase 2 — canonical user-identity schema model: three-table shape (users + tier_grants + credentials) consolidating six L3-truth sources, with magic_link_allowlist as separate issuance gate and the issuance-gate-vs-tier-grant distinction canonicalised; ADR-025 added 2026-05-27 per IDENTITY-MODEL-PLACEMENT-DECISION-01 — identity model placement in new dedicated rto-identity-db (with -staging twin per Peel taxonomy); customer-facing workers route identity reads via internal-api service-binding per MANDARIN; ADR-026 added 2026-05-27 PM per IDENTITY-MODEL-MIGRATION-01 Phase 2 substrate-reality finding — cross-database relational constraints are application-layer concerns: D1 cross-DB FK references parse at CREATE TABLE but fail at INSERT time, so canonical schemas document cross-DB logical relationships as comments not REFERENCES clauses; ADR-024 Consequences amended with forward-pointer to ADR-026; ADR-027 added 2026-05-27 PM per ENVIRONMENT-NAME-RENAME-01 close — environment parity is a canonical commitment: schema parity + operational-vocabulary parity committed, data parity NOT committed per CANONICAL-IDENTITY-VIA-UI-ONLY; D1-name lag forward-referenced to queued D1-NAME-ENVIRONMENT-RENAME-01; dev/prod canonical environment names replace inherited dev/staging/prod three-stage convention; ADR-027 back-pointer to CANONICAL-IDENTITY-VIA-UI-ONLY reconciled 2026-05-28 at IMM-01 Phase 3c close — parenthetical in "Data parity is NOT committed" updated to point at standing-rules.md canonical-work cluster following the discipline's graduation from Tim-filed candidate to standing rule; ADR-028 added 2026-05-28 per OBSERVABILITY-SUBSTRATE-DECISION-01 — system observability named as canonical substrate and as a new MANDARIN category (Telemetry); live telemetry to Cloudflare Analytics Engine, audit/activity records to D1; grounds ADR-017's observability requirement with a storage-and-viewing layer; motivating failure case (load-dependent failure invisible without instrumentation) recorded; whole-machine uniform coverage committed, metric depth lean-start; apps/site ops-db write retirement landed as two narrower retire briefs (WAITLIST-LOOKUP-LOG-RETIRE-01 + APIREQUESTS-LOG-RETIRE-01) retiring 2 dead pipelines gone-is-gone; other 4 writes kept or parked; Analytics Engine + rto-audit-db substrate declared by this ADR but not yet provisioned (first-application framing factually reconciled 2026-05-29); ADR-029 added 2026-05-28 per audit-substrate convention review — uniform activity stream chosen over per-event-kind tables; canonical activity shape in rto-audit-db (id, timestamp, source, client_id, actor, action, detail) with client_id nullable to encode operator-vs-client distinction natively and actor format pinned to ADR-024 canonical identity (usr_… UUID for users, system:<worker-name> for system actors); cross-machine reconnaissance is one query filtered by time + source/client_id; column set is extensible by follow-on ADR, table shape is load-bearing; ADR-030 + ADR-031 added 2026-05-30 per INTERNAL-API-ISOLATION-PROGRAMME-01 — ADR-030 closes internal-api's forgeable-X-RTP-Internal-Source trust flaw by topological isolation (internal-api becomes binding-only with no public route; supersedes and retires the per-source-secret approach in INTERNAL-API-SOURCE-HEADER-AUTH-01), ADR-031 commits channel-separated service architecture as the canonical default (within-account = service binding, across-boundary = cryptographic auth, browser → same-origin route, fail-closed with no public-HTTP fallback, non-forgeable internal caller identity required — resolving the inside-job fork ADR-030 opened).)


How this doc works

Each decision is captured as an ADR with five parts:

  • Title — short noun phrase naming the decision
  • Status — Accepted / Superseded by [N] / Deprecated
  • Date — when the decision was made
  • Context — what situation made this decision necessary
  • Decision — what was decided, in concrete terms
  • Consequences — what follows from the decision; what changes; what is now ruled out

Decisions are not edited in place. When a decision is superseded, the original entry remains with its status updated; a new entry is added with the new decision and a reference back to what it supersedes. This preserves the historical record of how RTOpacks' architecture evolved.

When a brief, module spec, or implementation references an architectural decision, it cites the ADR number (e.g. "per ADR-003"). This makes the dependency chain visible and prevents decisions from drifting back into chat-only or brief-only existence.


ADR-001 — RTOpacks operates a closed-loop architecture

Status: Accepted Date: 2026-05-26

Context. RTOpacks could be framed as a collection of features (modules) or as a closed-loop product (Sense → Decide → Execute → Deliver and Assess → Evidence). The framing determines how new capability is evaluated and how the product is sold.

Decision. RTOpacks operates as a five-phase closed loop. Every module, every new capability, every architectural decision is evaluated against whether it extends the closed loop coherently or fragments it. New surfaces that do not fit cleanly into one of the five phases are either re-scoped to fit, deferred until the right phase shape exists, or rejected.

Consequences. - Module specs are mapped to phases of the loop, not catalogued as independent features. - Future modules (e.g. student management, native LMS) must demonstrate how they extend the loop, not just what they do. - Competitive positioning is "we close the loop," not "we have feature X." - The substrate moat (ADR-002) and the architectural coherence (ADR-016, ADR-017) are what make machine-speed closure possible; all three are foundational to this decision.

See client-spine.md Section 2 for the full articulation.


ADR-002 — The substrate is the moat

Status: Accepted Date: 2026-05-26

Context. RTOpacks has ingested multiple Australian government data sources (TGA, yourcareer, ABS labour data, government funding and tender data, and others) over the course of building. The strategic significance of this substrate has not been canonically named in the docs corpus until now.

Decision. The ingested data substrate is RTOpacks' primary competitive moat. Product surfaces are derived presentations of the substrate. Strategic decisions — what to build, what to integrate, what to charge for — are evaluated against whether they leverage the substrate or operate independently of it.

Consequences. - Substrate ingestion is treated as foundational work, not as feature support. - Sync infrastructure (the IRSL pattern, Sync Watchtower, run-status logging) is the integrity layer for the moat and is funded operationally as such. - The hoover-mentality principle (ADR-003) and the pristine-source principle (ADR-004) follow from this decision. - The four moats — operational, continuity, historical, analytical — are named in the spine doc and inform which substrate work gets prioritised.

See client-spine.md Section 4.


ADR-003 — Substrate ingestion follows hoover-mentality

Status: Accepted Date: 2026-05-26

Context. Conventional engineering discipline holds that data should be ingested only when a current feature requires it. For a substrate-as-moat business, this discipline is structurally wrong: it produces a substrate that matches current need but not future opportunity, and it does so during a window of access that may close.

Decision. RTOpacks ingests what upstream sources expose, not only what current features consume. Discipline around what to mirror is calibrated to access-window availability and source completeness, not to current-feature need. New sources are evaluated under the same posture: if access is open and the data is relevant to the regulated training market, ingestion is the default action.

Consequences. - Storage costs are paid up-front and accepted as the price of holding substrate that competitors cannot retroactively obtain. - Marginal storage cost is treated as low; marginal future-value of historically-captured data is treated as high. - Ingestion decisions are made greedily within the access window; organisation of derived value happens after ingestion via the recompute-as-needed principle (ADR-005). - This decision must not be reversed by anyone applying conventional engineering discipline to it. Reversal requires explicit ADR superseding this one with named reasoning.

See client-spine.md Section 4 ("Working principles for the substrate").


ADR-004 — Ingested data is held pristine

Status: Accepted Date: 2026-05-26

Context. Ingested data could be cleaned, normalised, or corrected at ingestion time for convenience or analytical consistency. Each such modification, however small, converts upstream truth into RTOpacks-shaped opinion. The moat's value depends on the substrate's faithfulness to upstream at the moment of ingestion.

Decision. RTOpacks does not modify ingested data for any reason. Not for cleanup, not for normalisation, not for correction, not for convenience, not for analytical consistency. Layer 1 (the ingested raw) is immutable from RTOpacks' perspective. All modifications, normalisations, joins, and corrections happen in Layer 2 (derived/computed) or Layer 3 (client/operational state), which reference but do not mutate Layer 1.

Consequences. - No fields are added to mirror tables (including no is_client flags, no status annotations, no derived columns). - Sync workers may add operational metadata (synced_at, mirror_version) on metadata tables adjacent to but separate from the substrate tables themselves; these do not violate the principle because they describe the sync operation, not the upstream data. - Diff-detection between client self-reports and upstream-sourced data is implemented via the client file (Layer 3) referencing the mirror (Layer 1), never by mutating the mirror to record client positions. - Cross-source joins, aggregations, and computed views live in Layer 2 and are recomputable from Layer 1 at any time. - This decision is load-bearing. Future briefs that propose any modification to ingested data must justify the modification against this ADR and either supersede it or accept the no-modification posture.

See client-spine.md Section 4 ("The three-layer data architecture", "Working principles for the substrate").


ADR-005 — Derived layers are recomputable

Status: Accepted Date: 2026-05-26

Context. Derived tables and computed views could be treated as precious — patched when wrong, evolved in place, accumulated over time. This produces a derived layer that becomes its own form of accreted state, with the same drift and silk-thread problems as the operational layer.

Decision. Layer 2 (derived and computed) is recomputable from Layer 1 at any time. Derived tables are not precious. When derivation logic is wrong or improved, the derived layer is rebuilt from Layer 1 with the new logic. Derivation logic lives in version-controlled code; the derived tables themselves are caches of that logic's outputs.

Consequences. - New analytical questions do not require new ingestion; they are computed from substrate already held. - Storage costs are paid for Layer 1; compute costs are paid as Layer 2 derivations are needed. - Derived tables can be dropped and rebuilt without permanent loss, provided the derivation logic is preserved in code. - This decision is what makes the pristine-source principle (ADR-004) operationally workable — there is no temptation to "fix" Layer 1 because Layer 2 is freely rebuildable.

See client-spine.md Section 4 ("The three-layer data architecture").


ADR-006 — Cloudflare-first for all new capability, including identity

Status: Accepted Date: 2026-05-26 Builds on: the CLOUDFLARE-FIRST RULE in standing-rules.md Refined: 2026-05-26 (evening) — clarified that external providers are gap-specific, not wholesale replacements; multiple targeted providers may fill different gaps.

Context. RTOpacks is built top-to-bottom on Cloudflare. New capability needs — including identity and credential management — could be met by external vendors (Auth0, Clerk, WorkOS, etc.) or by Cloudflare-native primitives. The CLOUDFLARE-FIRST RULE says Cloudflare is asked first; this ADR confirms that the rule applies specifically to identity and credential management, and articulates the reasoning.

Decision. For all new capability — and explicitly including identity and credential management — Cloudflare is evaluated first. The evaluation asks: does Cloudflare provide this; is Cloudflare shipping this; is the gap one Cloudflare structurally cannot close. External vendors are entertained only at the third question.

For identity specifically, the three customer-type requirements (Type 1 / Type 2 / Type 3 signup flows; email verification; SMS verification; session management; auditable identity events; OIDC scaffolding for future federation) are mapped against Cloudflare's current and announced identity capabilities. If Cloudflare can meet the requirements, Cloudflare wins. If Cloudflare cannot meet a specific requirement, that gap is the only basis for entertaining an external provider for that specific gap.

Consequences. - Operational property of "one throat to choke" is preserved at the Cloudflare layer — debugging, support, and incident response for Cloudflare-served capability stay within a single vendor surface. - Substrate coherence is preserved — Cloudflare-native integration removes connector code, secret management, callback URL plumbing, and webhook reconciliation that would otherwise become silent-thread architecture. - Federation (enterprise customers bringing their own IdP) is treated as a future credential type, integrated via Cloudflare's OIDC capability when available. - External providers, when entertained, are gap-specific rather than wholesale replacements. Multiple external providers may end up filling different specific gaps (e.g. SMS verification per ADR-014, OIDC federation if Cloudflare cannot deliver it, anything else where the gap is real). Each gap-filling provider gets its own ADR. The combination is "Cloudflare-served capability + targeted external providers per specific gap," not "single external IDP replacing Cloudflare." - Each external provider integration follows ADR-015 (full-API-wrapper pattern) and the EXT-API RULE in standing-rules.md (reference doc required before deploy). - The specific identity-provider decision is captured in CREDENTIAL-PROVIDER-DECISION-01 (a pending brief), which will execute the evaluation described above and close with specific recommendations per gap rather than a single-vendor answer.

See client-spine.md Section 7.


ADR-007 — Users and credentials are separated at the schema level

Status: Accepted Date: 2026-05-26 Refined: 2026-05-26 (evening) — consequence #3 updated to reflect substrate finding that passkey_credentials has sign_count=0 on both rows; "prototype" softened to "placeholder."

Context. A user (the thing that has roles, takes actions, attaches to client files) and a credential (the thing that authenticates a user) are different concepts. Conflating them in the schema produces an identity model that cannot support multiple credential types, cannot support federation later, and cannot support credential rotation without a schema migration.

Decision. Users and credentials are separated at the schema level. A user is an entity (with role, client attachment, session ownership, action history). A credential is a separate entity (with type, identifier, validity window, parent-user reference). A user can have multiple credentials over time; a credential authenticates one user.

Consequences. - Credential management is outsourced (ADR-006). The internal user model holds references to externally-managed credentials, not the credentials themselves. - Multiple credential types per user (passkey, OIDC-via-IdP, magic-link, etc.) are supported by the schema without further changes. - Federation (ADR-006) is implemented as adding a new credential type, not as building a separate user system. - Credential rotation, MFA enrolment, and credential revocation are operations on credential entities, not on user entities. - The current ops-db passkey_credentials table is the placeholder for this separation, not the prototype. OPS-DB-CONTENT-AUDIT-01 found sign_count=0 on both enrolled rows, meaning verification has never succeeded against the table; the principle has not been exercised by real authentication flow. ADMIN-AUTH-MODEL-RECONCILIATION-01 (likely subsumed by CREDENTIAL-PROVIDER-DECISION-01) is the brief that resolves whether the placeholder graduates to a real prototype, gets dropped, or gets replaced by outsourced credentials. - The identity model rationalisation work (IDENTITY-MODEL-RATIONALISATION-01, pending brief) resolves the four current identity conventions into one that honours this principle. - Composition with ADR-020. The "user" entity in this ADR maps to T4A (client user tier) in ADR-020. T4 administrator role is held by a user but is conceptually distinct from the user identity. T4A users are separated from credentials per this ADR; the T4 administrator role attaches to a T4A user identity rather than existing as its own credential-bearing entity. Identity rationalisation work (IDENTITY-MODEL-RATIONALISATION-01) resolves the substrate state to honour both this separation and ADR-020's tier model.

See client-spine.md Section 7.


ADR-008 — No platform tier above clients

Status: Accepted Date: 2026-05-26

Context. The current ops-db schema contains tables (accounts, account_worlds, account_memberships, custom_roles) that imply a tenancy tier above clients — a platform-level concept that holds RTOpacks (and notionally other future products) above the clients RTOpacks serves. This structure was inherited from an earlier mental model that treated RTOpacks as downstream of a broader UCCA universal-credential thesis.

This mental model is no longer accurate. RTOpacks is its own product serving its own market. UCCA Inc and UCCA Pty Ltd are corporate entities, not architectural tiers. There is no platform-tier abstraction that customers belong under.

Decision. RTOpacks does not implement a platform-tier abstraction above clients. The architectural hierarchy is: client file → users-attached-to-clients → modules-as-lenses. If a future product requires a tenancy model above clients (e.g. white-label resale, multi-product platform), it will be added deliberately at that point with clear product reasoning, not retained speculatively as legacy scaffolding.

Consequences. - The five identity-tangle orphan tables (accounts, account_worlds, account_memberships, custom_roles, customers) surfaced in OPS-DB-CONTENT-AUDIT-01 are drop candidates. The drop is executed via a downstream brief (likely OPS-DB-IDENTITY-ORPHAN-CLEANUP-01) once the spine doc is canonical. - Any future feature request that implies a platform tier (e.g. "white-label this for another reseller") triggers a deliberate ADR conversation, not an ad-hoc schema extension. - The corporate separation (UCCA Inc holds the engine; UCCA Pty Ltd operates RTOpacks) is honoured in legal structure, not in product schema. - Composition with ADR-020. T3 (operator tier) per ADR-020 is the substrate-operating tier, not a commercial tier above clients. This ADR's statement that there is no platform tier above clients remains accurate at the commercial layer: clients do not belong under a commercial parent entity. T3 operates the substrate that clients use; T3 does not commercially sit above clients in a tenancy hierarchy. The two architectural facts compose without conflict.

See client-spine.md Section 1 ("The 'RTOpacks didn't start here' note") and Section 7.


ADR-009 — Client-file attachments are deliberate

Status: Accepted Date: 2026-05-26

Context. A large amount of operational state hangs off the client file — Studio sessions, generated content outputs, staff work products, drafts, exports, version histories, audit decisions, delivery records, assessment outcomes, correspondence threads. The temptation under feature pressure is to attach new state ad-hoc, with whatever shape the immediate feature demands. This produces silk-thread architecture: relationships that exist but are not visible at the schema level, inferred at runtime rather than asserted at design time.

Decision. Every attachment to the client file is deliberate, named, and broadcasts its intent clearly. Solid, thoughtful anchorage points that any future module or surface knows how to use. Before a new attachment is built, the four substrate questions are answered: what does it know, where does it live, who sees it, how does it fail. If those four answers do not exist, the attachment is not built.

Consequences. - Module specs include their client-file attachment shape as a first-class section, not as implementation detail. - New attachments require their access-pattern, permission model, lifecycle (creation/update/archival/deletion), and failure mode to be specified before implementation begins. - The four substrate questions are a standing checklist applied to every new attachment, captured in spine § 5 and referenced from this ADR. - This ADR is foundational to the broader ADR-016 modular-consolidation principle: every client-file attachment is a candidate for consolidation if the same shape of attachment appears more than once.

See client-spine.md Section 5.


ADR-010 — Export-portable output is a first-class capability

Status: Accepted Date: 2026-05-26

Context. RTOpacks could choose to lock clients into native delivery (RTOpacks-served LMS only) as a strategic moat, or could treat export to client-owned LMSes as a first-class capability. The first option maximises platform stickiness; the second option respects client portability and avoids vendor-capture perception.

Decision. Export-portable output is a first-class capability. UCCA / Studio output formats include SCORM, print, and other exportable formats. Clients who prefer their own delivery infrastructure use export; clients who prefer RTOpacks-native delivery use InstaLearn (or the regulated-LMS surface as it matures). Both paths are first-class, not "native + lesser export."

Consequences. - Studio's output pipeline produces export-compatible artefacts as a first-class workflow, not as a special-case feature. - The closed loop (ADR-001) is honoured for clients who use RTOpacks' native delivery; for clients who export, the loop is intentionally open at the Deliver-and-Assess phase, with RTOpacks providing the artefacts and the client providing the delivery substrate. - This decision protects RTOpacks from the perception of vendor lock-in and aligns with serving regulated professionals who value their portability. - When the LMS surface expands into regulated delivery, this ADR is the principle that prevents the LMS from becoming a closed system by accident.

See client-spine.md Section 6 ("The six current modules" — Deliver and Assess).


ADR-011 — Three customer types with explicit data-provenance posture

Status: Accepted Date: 2026-05-26

Context. RTOpacks serves more than one kind of customer. Treating them as homogeneous produces a schema and product surface that fits none of them well. The three types have fundamentally different relationships to regulatory authority and to the substrate.

Decision. RTOpacks recognises three customer types:

  • Type 1 — Registered RTO. Listed in TGA. Client file auto-populated from the mirror at signup. Drift between self-reported state and regulator-sourced state is a first-class product surface.
  • Type 2 — Pre-registration RTO. Not yet in TGA. Self-reports into TGA-shaped schema. Graduates to regulator-verified status at TGA registration via a confirmation step.
  • Type 3 — Non-RTO content creator. Independent course developers, corporate trainers, freelancers. Different product surface, different commercial terms. Includes potentially adversarial users (competitors scoping the tooling); surface is designed to be useful for legitimate Type 3 use while structurally non-leaky of Type 1/2 substrate value.

Every field in the client file carries a provenance marker: regulator-verified, self-reported, derived, or RTOpacks-created. Provenance is a load-bearing schema property that determines how each field is treated by consuming modules.

Consequences. - The signup flow (ADR-012) varies by type but uses a single underlying mechanism. - Module entitlements may vary by type — some modules are Type 1-only (Radar, Record), some span all three (Studio, possibly InstaLearn). - The Type 2 → Type 1 graduation is a switch-flip with confirmation, not a data migration; field provenance updates but the data itself does not move. - Type 3 product surface design includes deliberate consideration of competitive-intelligence risk — what is shown to a Type 3 user must not leak the substrate's analytical value to competitors.

See client-spine.md Section 3.


ADR-012 — Signup is open; admin authority is asynchronous

Status: Accepted Date: 2026-05-26

Context. Multi-user customers (especially TAFEs and larger RTOs) need a signup model that handles "who signs up first," "who has admin authority," and "what happens when the second user signs up later." Gating signup on CEO availability creates friction; making the first signup automatically the admin creates security risk.

Decision. Account creation and administrative authority are decoupled.

  • Whoever signs up first from an RTO domain (verified via TGA mirror + SMS) creates the client file and gets a pending-verification user role.
  • Administrative authority is established asynchronously via CEO notification (email to the CEO on file in TGA, with a three-button decision interface: confirm this user, set up your own admin, report as unauthorised).
  • Until the CEO acts, the first signup operates in pending-verification mode with limited access.
  • Subsequent users from the same RTO domain follow the identical flow: pending user attached to existing client file, notification to current primary admin, same three-button decision.

Consequences. - One signup flow to design, build, test, and maintain — not separate flows for first user versus Nth user. - The new user's experience is not gated on the CEO's responsiveness. - The CEO email is a verification channel, not a permission-grantor — they make a single decisive choice, they don't run admin operations. - Soft escalation paths exist for the case where the CEO never responds. - Composition with ADR-020. The "first signup creates the client file with pending-verification role" pattern in this ADR composes with the T4 administrator establishment in ADR-020 as follows: the first signup creates the client file and is provisionally attached as a T4A user with a pending T4 administrator role. The CEO notification flow described in this ADR is what establishes T4 administrator authority canonically — once the CEO confirms, the first user holds both a T4A user identity and the T4 administrator role. Subsequent users from the same RTO domain follow the identical flow per this ADR, defaulting to T4A user identity with no T4 administrator role unless explicitly granted by the current T4 administrator.

See client-spine.md Section 8.


ADR-013 — RTOpacks delivers two streams that compose into one product

Status: Accepted Date: 2026-05-26

Context. RTOpacks' capabilities span strategic intelligence (where the money is) and compliant execution (the operational machinery of being an RTO). These could be framed as two separate products that share infrastructure, or as one product with two streams that compose into a coherent thesis.

Decision. RTOpacks delivers two streams that compose into one product:

  • Strategic intelligence stream — surfaces where the money is, what the market wants, where the gaps are, how the client's current scope is performing.
  • Compliant execution stream — produces training, assessment, and audit evidence to the letter of the law.

Either stream alone has limited commercial value. Strategic intelligence without compliant execution is knowledge with no operational expression. Compliant execution without strategic intelligence is either inertia or guesswork. The two streams compose into the closed loop (ADR-001).

The streams may be commercially packaged separately if market reach demands it, but the architectural commitment is to one coherent product, not two independent products that happen to share data.

Consequences. - Marketing and sales messaging articulates the two-stream framing, not a feature catalogue. - Pricing and packaging decisions can offer the streams together or apart, but the integration is the value proposition. - Module specs are mapped to one or both streams; new modules are evaluated against which stream they extend. - The duck-and-shark example in client-spine.md Section 6 is the canonical illustration of why both streams compose.

See client-spine.md Section 6.


ADR-014 — SMS is outsourced to Cellcast and wrapped as a shared internal module

Status: Accepted Date: 2026-05-26

Context. RTOpacks needs SMS capability for the signup verification gate (per spine § 8 — paired with email matching as the two-factor signup verification) and for future use cases including client-side messaging (RTOpacks-to-client communications, client-to-student communications), 2FA flows if RTOpacks ever holds credentials internally, and possible sub-client SMS provisioning if RTOpacks ever offers SMS-as-a-feature to client RTOs.

The choice of provider matters because it constrains pricing, geographic coverage, API quality, and the integration discipline. The Cloudflare-first rule (ADR-006) was applied: Cloudflare does not offer native SMS sending. SMS is therefore a gap Cloudflare structurally cannot close, and an external provider is the correct answer.

Provider evaluation landed on Cellcast (Melbourne, Australia) over alternatives including Twilio and ClickSend, on the basis of: AU-native operation with AU-onshore data (matching RTOpacks' primary AU customer base, with US/UK extension available if needed); ISO 27001 certification; lower cost than Twilio for AU sending; published OpenAPI specification available for download at developer.cellcast.com/cellcast-api.swagger.yaml; a substantial API surface (~50+ endpoints) including features RTOpacks may want in the future (sub-client account provisioning, RMS two-way chat, virtual number purchasing, opt-out management, file scheduling, 2FA flows, custom short URLs); and an existing Cellcast account already established by RTOpacks for production use.

Decision. RTOpacks uses Cellcast as its SMS provider. Specifically, the Enterprise Platform API surface documented at developer.cellcast.com (not the older Classic Platform). SMS access is mediated by a shared internal module (the SMS module) that wraps Cellcast's Enterprise Platform API. All RTOpacks code that sends or receives SMS routes through this module; no consumer makes direct Cellcast API calls.

The SMS module follows the external-service integration pattern (ADR-015) — it wraps Cellcast's full published API surface, including endpoints RTOpacks does not currently consume.

Consequences. - The Twilio scaffolding currently present in the substrate (interaction_log.twilio_sid columns and any related code surfaced in OPS-DB-CONTENT-AUDIT-01) is superseded by this decision. A downstream brief (SMS-CURRENT-STATE-AUDIT-01 or similar) audits what Twilio integration is actually wired vs scaffolded, and the eventual SMS module brief replaces or removes it. - The SMS module is substrate-shared infrastructure, not module-specific. Studio, Record, the signup flow, and any future module needing SMS all consume it. - The Cellcast OpenAPI spec at developer.cellcast.com/cellcast-api.swagger.yaml is the contract the wrapper mirrors. The full API is wrapped per ADR-015; Cellcast endpoints RTOpacks does not currently consume (sub-client accounts, virtual number purchase, international sending, RMS chat, file scheduler, custom short URLs, etc.) are wrapped against future use. - Per the EXT-API RULE in standing-rules.md, a reference doc at docs/ops/cellcast-api-reference.md is required before the SMS module deploys. This doc covers endpoint inventory (from the OpenAPI spec), auth model, rate limits, known quirks, sandbox vs production posture, and example requests. The SMS module brief includes drafting this reference doc as part of its scope. - Provider-swap stays clean if Cellcast ever needs to be replaced. The wrapper's interior changes; every consumer of the wrapper stays unchanged. - ADR-006's external-provider clause is exercised by this decision. Future identity-related external providers (OIDC for federation, anything else Cloudflare cannot fill) follow the same pattern: gap-only, full-API-wrapped, behind a shared internal module.

See client-spine.md Section 8 (signup gate) and Section 7 (Identity and access; SMS is verification infrastructure, distinct from credential management).


ADR-015 — External service integrations wrap the provider's full API surface

Status: Accepted Date: 2026-05-26

Context. When RTOpacks integrates with an external service, there are two patterns the integration can follow. The minimal pattern wraps only the endpoints currently consumed by features: write only what is currently used, expand the wrapper as new uses arise. The full-surface pattern wraps every endpoint the provider exposes, regardless of which endpoints current features consume.

The minimal pattern is conventionally the engineering-disciplined choice. For RTOpacks' situation — small team, substrate-as-moat thesis, intent to compose external services into a coherent platform — it is structurally wrong. The minimal pattern produces three drag effects: every new feature that needs an unwrapped endpoint requires wrapper-extension before feature work begins; debugging routes through "wrapped paths" and "ad-hoc paths" inconsistently; and provider-swap forces auditing every consumer to confirm which endpoints they actually use, because the wrapper itself does not encode the full possible surface.

This ADR is one specific application of the broader modular-consolidation principle (ADR-016): when the same shape of work — external-service integration — appears repeatedly, the discipline is to consolidate into a shared pattern rather than reinvent per-integration.

Decision. When RTOpacks integrates with an external service, the internal module wraps the provider's full published API surface. Every documented endpoint gets a corresponding function in the wrapper, even if no current feature consumes it.

If the provider publishes a machine-readable specification (OpenAPI/Swagger, GraphQL schema, gRPC proto), the wrapper is anchored to that specification — either autogenerated from it, programmatically validated against it, or hand-written but verified against it via tooling.

If the provider publishes only prose documentation, the wrapper is hand-written to mirror the documented surface, and stays in sync via deliberate maintenance.

Consequences. - New features that use external services never require "first add the wrapper, then build the feature." The wrapper is already present. - Debugging routes through one consistent layer per external service. Every Cellcast call, every Stripe call, every QuickBooks call goes through its wrapper. No ad-hoc bypass paths. - Provider-swap is clean. The wrapper's interior changes; consumers stay unchanged. Auditing consumers for hidden direct calls is unnecessary because direct calls are not permitted. - The wrapper effort is up-front rather than amortised. Initial integration is slightly larger work than the minimal pattern; subsequent feature work that uses the external service is substantially smaller. The trade is correct for RTOpacks' situation: substrate-as-moat businesses pay for completeness early because completeness is what enables future composition. - Cellcast (ADR-014) is the canonical first application of this pattern. Every future external service integration — including any external OIDC provider that lands per ADR-006, any future payment processor evolution, any analytics integration — follows the same posture. - Wrappers are version-controlled internal infrastructure. When a provider updates their API (adds endpoints, deprecates endpoints, changes contract), the wrapper update is treated as substrate work, not as ad-hoc maintenance against a specific consumer's needs.

This ADR is load-bearing for substrate composition. Reversing it requires explicit ADR superseding this one with named reasoning. Arguments of the form "we only need three endpoints right now, let's just wrap those" do not justify reversal; they describe the failure mode this ADR prevents.

See ADR-014 (Cellcast as canonical first application), ADR-016 (the broader modular-consolidation principle this exemplifies).


ADR-016 — Modular consolidation over ad-hoc attachment

Status: Accepted Date: 2026-05-26

Context. A recurring pattern has appeared multiple times in RTOpacks' substrate history: code written for an immediate need, attached at the point of use, never abstracted to a form that the next similar need could consume. The instances are not identical in shape but share a common form — each one solves a specific problem in isolation, and the substrate accretes variant solutions to similar problems without recognising the underlying pattern.

Documented instances of this pattern in RTOpacks' history:

  • Sync ops before the IRSL pattern. Each sync worker had its own ad-hoc logging. Each one solved "did this run actually fire" differently — or didn't solve it at all. The IRSL brief was modular consolidation: same pattern, same shape, applied across workers via a shared discipline.
  • Run-status classification before recent honesty work. Different workers reported "success" via different fields, different conventions, different placeholder behaviour. The Sync Observatory's classifier work consolidated how truth got determined across the corpus.
  • Identity conventions in ops-db. Four different ways of saying "who is the user" — email-keyed, UUID-keyed, prefix-keyed, randomblob-keyed. Each accreted as a new feature was built without reference to what came before. IDENTITY-MODEL-RATIONALISATION-01 is the pending consolidation.
  • Twilio references in interaction_log. Direct field names baked into the schema rather than abstracted behind a messaging interface. The very thing ADR-014 + ADR-015 establishes the discipline against.
  • stats-cache writing tga_snapshots into the same DB as the mirror. A derivation output (Layer 2) landing in the same substrate as Layer 1 because no architectural separation forced it elsewhere.

These are the same pattern at different scales: code attached at the point of immediate use, never consolidated into a form that subsequent similar needs could consume.

Decision. When the same shape of work appears more than once in the codebase, the second instance is the trigger for consolidating into a shared module rather than copying the pattern again. The shared module becomes substrate-shared infrastructure; subsequent instances consume it rather than reimplementing.

The trigger is two instances. First instance is permissible as feature work — at one occurrence, the pattern has not yet been demonstrated to be recurring, and consolidation is premature abstraction. Second instance is the warning sign: the pattern is recurring; either consolidate now or accept that the third instance will be built against ad-hoc precedent, with all the substrate-coherence costs that implies.

If the answer at the second instance is "no, these are not really the same pattern, the surface similarity is misleading," that judgment is made deliberately and named. Not every similar-looking pattern needs consolidation; only the ones that will recur.

Consequences. - Every new feature involving a recurring substrate operation is evaluated against existing patterns before implementation begins. If a shared module exists, the feature consumes it; if not, the question "should we consolidate now?" is asked deliberately. - Briefs that introduce a second instance of a pattern include the consolidation as part of their scope, not as deferred follow-up work. Deferred consolidation almost never happens; consolidation happens at the moment the recurrence becomes visible. - Existing substrate that violates this principle is identified via audit and consolidated via dedicated briefs (e.g. IRSL pattern was the consolidation for sync logging; future analogous briefs consolidate other patterns). - This ADR is what makes ADR-009 (deliberate client-file attachments) and ADR-015 (full-API-wrapper for external services) instances of a general principle rather than isolated rules. Both are modular-consolidation in different domains. - A subtle failure mode is over-modularising: abstracting things that look similar at first glance but don't actually share a useful underlying form. This ADR explicitly accepts that risk by requiring two instances as the trigger; one instance is too few to know whether the pattern recurs, and waiting for three is too many because the third instance is built against the wrong precedent. - This ADR is foundational for substrate maintainability at three-person scale. A larger team can carry some pattern duplication through human memory; three people cannot. The substrate has to broadcast its own coherent patterns, and consolidation is what produces coherent patterns to broadcast.

See also ADR-009 (client-file attachments as instance), ADR-015 (external service integration as instance), ADR-017 (bus pattern as related principle for flows), client-spine.md Section 9.


ADR-017 — The in-and-out bus pattern for all flows

Status: Accepted Date: 2026-05-26

Context. RTOpacks' substrate has many flows: inbound webhooks (from Stripe, GitHub, ASQA notifications, scheduled triggers), outbound API calls (to TGA, yourcareer, ABS, Cellcast, payment processors, email providers), internal worker-to-worker messages (between sync workers, between modules), and scheduled task firings (cron-driven sync runs, periodic recomputes, scheduled exports).

Each of these flows could be implemented per-consumer — webhook handler attached to whichever worker happens to receive it, outbound call written inline wherever it's needed, worker-to-worker calls made via whatever mechanism the writer reaches for. This is the conventional engineering posture for small systems, and it produces predictable results: flows scatter across the codebase, instrumentation discipline varies by author, debugging requires tracing through whatever ad-hoc structure each flow happens to use.

For RTOpacks specifically, with a substrate-as-moat thesis and three-person operations, scattered flows produce three operational failure modes:

  • Findability fails. When something breaks, the broken flow's location is not immediately clear. "Did the Cellcast webhook fire? Where is that handler? Which worker owns it?" Each answer is a separate code search.
  • Observability fails. Per-worker telemetry produces inconsistent metrics, inconsistent logging, inconsistent error reporting. Cross-flow analysis ("what are all the outbound calls failing today") requires reconstructing data that was never instrumented uniformly.
  • Replaceability fails. When a flow needs to change — different provider, different routing, different protocol — the change cascades through every consumer that holds the flow's logic inline.

Decision. Every flow of data into and out of RTOpacks passes through a known, named, observable structure. These structures are called buses in the architectural sense — they are centralised points that handle one category of flow, route appropriately to consumers, and provide consistent instrumentation and observability.

Specifically:

  • Inbound buses. Webhooks, scheduled triggers, and external pushes route through named inbound bus structures. Each inbound bus knows what types of messages it receives, what consumers handle them, and how to log and instrument them consistently.
  • Outbound buses. External API calls (to providers like Cellcast, TGA, payment processors) route through named outbound bus structures, layered on top of the per-provider wrappers from ADR-015. The wrapper holds provider-specific contract; the bus holds the consistent observability layer.
  • Internal buses. Worker-to-worker communication routes through named internal bus structures rather than ad-hoc direct calls. This is especially important for cross-module communication where the consumer and producer may evolve independently.
  • Scheduled buses. Cron-driven and time-triggered firings route through a named scheduling bus that provides consistent run-status logging (per the IRSL pattern), retry behaviour, and failure handling.

The bus pattern does not require a literal message-queue infrastructure (though some buses may be implemented that way). It requires every flow to have a named, traceable path through known infrastructure, with consistent instrumentation properties applied uniformly.

Consequences. - Findability: every flow has a known location. "Where is the Cellcast inbound webhook handler" has a single answer that doesn't require code search. - Observability: instrumentation is applied at the bus level, not per-consumer. Adding a metric, a log field, or a trace span happens once at the bus and propagates to all consumers. - Replaceability: provider swaps, protocol changes, and routing changes happen at the bus or wrapper level. Consumers stay unchanged. - Naming: every bus has a canonical name that broadcasts its function (e.g. tga-mirror-ingest-bus, sms-outbound-bus, webhook-inbound-bus). The naming convention follows the architectural cartography convention (ADR-018). - Current implementation note: the existing substrate has many flows that do not yet route through named buses. Bringing flows under bus discipline is the work of dedicated downstream briefs, not a single mass refactor. The principle applies to all new flow work immediately; existing flows are migrated as their owners are touched for other reasons. - This ADR composes with ADR-015 (external service wrappers) and ADR-016 (modular consolidation). The wrappers handle per-provider contract; the buses handle cross-flow observability; consolidation handles pattern recurrence. Together they produce a substrate where flows are coherent, named, and traceable.

See client-spine.md Section 9, ADR-016 (modular consolidation), ADR-018 (cartography for visualisation of buses).


ADR-018 — Architectural cartography is a canonical artefact

Status: Accepted Date: 2026-05-26

Context. The architectural disciplines articulated across this doc and the spine doc produce a substrate composed of named, modular units (per ADR-016) connected by named, traceable flows (per ADR-017). Without a way to view that substrate at a glance, the disciplines hold the substrate together but do not make it operationally legible. A new conversation between Tim and Claude about substrate work requires either both parties to hold the same map in their heads, or extensive back-and-forth to establish shared reference.

A canonical visual map of the substrate solves this problem. It also serves three additional purposes:

  • Orientation for new context. When future Claude opens a session, the map orients far faster than prose ever could. "What does the substrate look like" has an immediate visual answer.
  • Dependency tracing. When Alex needs to understand what depends on a piece of substrate before changing it, the map shows the dependency graph rather than requiring reconstruction.
  • Substrate self-documentation. The map IS part of the substrate. As the substrate evolves, the map evolves alongside it, so the substrate carries its own current documentation rather than depending on prose that may have drifted.

Decision. RTOpacks maintains a canonical architectural map of the substrate as a first-class artefact alongside the spine doc and the ADR log. The map is hierarchical (multiple levels of resolution from 10,000-foot system view down to component-level detail), version-controlled, and updated alongside the substrate.

Tool choice: Mermaid as primary format. Mermaid is chosen because:

  • Renders natively in mkdocs (so the map appears on docs.rtopacks.com.au alongside the prose docs)
  • Lives in version control as text
  • Diffs cleanly in git
  • Readable and writable by Claude and Alex without special tooling
  • Sufficient expressivity for system maps, flow diagrams, and dependency graphs at the scale RTOpacks needs

If Mermaid hits expressivity ceilings at some specific level (e.g. very dense component-level maps may need a different format), alternatives are evaluated for that specific level. Mermaid is the default; departures from default require named reasoning.

Hierarchy. The map has multiple resolutions:

  • System level (10,000-foot view). Major systems and their major flows. Shows things like "Sync substrate," "Customer-facing surface," "Internal API," "Identity layer," with the principal flows between them.
  • Module level. Each major system expanded to show its modules, internal flows, and external integrations.
  • Component level. Each module expanded to show concrete bindings — databases, KV namespaces, R2 buckets, specific API endpoints, worker-to-worker calls.

Each level is canonical at its own resolution and references the levels above and below. A reader can zoom in or out as needed.

Naming convention. Components on the map carry names following the shape <scope-prefix>-<domain-or-module>-<function>-<artefact-type>-<environment> with each segment serving a distinct semantic role.

Scope prefix — three categories: rtopacks-*, rto-*, or no prefix (utility). The leading prefix (or its deliberate absence) encodes the artefact's architectural ownership semantics. This is not a brand sticker; it is a meaningful architectural fact broadcast by the name itself.

  • rtopacks-* — platform-owned. Function serves RTOpacks-operator purposes. Substrate ingestion workers, sync infrastructure, external service wrappers, operational tools, admin surfaces. Things RTOpacks the operator runs for its own purposes; clients never interact with them.
  • rto-* — client-facing. Function reaches client outcomes directly. Customer-facing surfaces, client-data databases (Studio, People, Documents data), client-workflow workers. Things the RTO client experiences, holds, or uses; the function's direct outcome is a client outcome.
  • No prefix — utility infrastructure. Function is self-describing; ownership is irrelevant because the artefact serves all sides as substrate-utility, like water or electricity to a building. The artefact's name fully describes its function without ownership inference required. Example: internal-api-worker-prod.

The determination test. A new artefact's category is determined by applying three questions in order:

  1. Q1 — Is the artefact's name fully self-describing such that a reader who knows nothing about RTOpacks could understand what it does from its name alone? If yes, proceed to Q1b. If no, the artefact takes a scope prefix; proceed to Q2.

  2. Q1b — Is the artefact's behaviour environment-neutral? "Environment-neutral" means the artefact does the same thing on the prod side of the Mandarin and on the dev side, with no asymmetric bridging to environment-specific external substrates. Both Q1 AND Q1b must hold for the artefact to qualify as utility. If yes, the artefact is utility infrastructure (no scope prefix). If no — the artefact bridges across the Mandarin to different external substrates per environment (sandbox vs live external services, dev-only data sources, environment-specific upstream endpoints) — it does not qualify as utility regardless of name self-description; proceed to Q2.

Examples of Q1b passing: internal-api (same mediation logic on both sides; doesn't reach external substrates), webhook-inbound-bus (webhook routing is the same regardless of which side), geocoder (geocoding produces the same outputs from the same inputs; different API key per environment is configuration, not behaviour). Examples of Q1b failing: quickbooks-reconcile (dev connects to QuickBooks sandbox, prod connects to QuickBooks live — different external substrates), any stripe-* integration (dev = test mode, prod = live mode — different external substrates). Vendor brand names (QuickBooks, Stripe) resolve to full words in canonical names; informal short forms like qb are not used.

  1. Q2 — Otherwise, does the function reach a client outcome directly? "Directly" means the artefact's primary function terminates in a client outcome, not in a technical outcome that other artefacts transform into client outcomes. If yes, the prefix is rto-*. Examples: rto-site-surface-prod (a client visits the site and takes action; the surface's function terminates in client outcome), rto-studio-db-prod (Studio data is what the client works with; the database's function terminates in holding and returning that data).

  2. Q3 — Otherwise — the prefix is rtopacks-*. Platform-owned by default. The artefact's function serves the platform's needs, with any client benefit reached only via other artefacts' functions. Examples: rtopacks-tga-ingest-worker-prod (ingests TGA data into substrate; client benefit reached only via derived presentations through other artefacts), rtopacks-admin-surface-prod (admin tool used by RTOpacks operators; clients never interact with it), rtopacks-quickbooks-wrapper-prod (wraps QuickBooks API; dev and prod connect to different QuickBooks substrates per ADR-019).

The tests are sequential. The first one that resolves determines the category. This eliminates judgment calls — every artefact's name is mechanically derivable from its function.

Utility is architecturally earned, not name-given. A worker that consumes a wrapper which has encapsulated environment-asymmetry can itself potentially be utility-shaped, because the wrapper has absorbed the asymmetry. This composes elegantly with ADR-015 (full-API wrappers): external asymmetry lives in environment-specific wrappers (rtopacks-quickbooks-wrapper-prod and rtopacks-quickbooks-wrapper-dev), while consumers of the wrapper that operate on stable internal interfaces can themselves achieve environment-neutral behaviour and earn utility status. The architectural path: push external asymmetry to wrapper boundaries; behind those boundaries, environment-neutral utility becomes achievable. See ADR-019 for the Mandarin-enforcement principle that makes this composition possible.

The discipline that protects the utility category from becoming an escape hatch. The utility category exists to honour artefacts whose names genuinely self-describe and whose ownership is architecturally irrelevant. It is not a convenient place to put artefacts whose ownership is unclear or contested.

The high-bar test: "Can a reader who knows nothing about RTOpacks understand what this artefact does from its name alone?" If the answer requires any inference about RTOpacks-specific context, ownership, or domain, the artefact is not utility infrastructure and takes a scope prefix.

  • internal-api passes: "internal" and "api" together fully describe the function — a mediation API used internally.
  • site fails: doesn't tell the reader whose site or what kind; needs rto-* prefix.
  • admin fails: doesn't tell the reader whose admin or what for; needs rtopacks-* prefix.
  • studio-db fails: a reader doesn't know whose studio or what kind of studio; needs rto-* prefix.

When an artefact resists mechanical naming. If applying the three sequential tests does not produce a clean answer — for example, the artefact has multiple functions, some of which reach client outcomes directly and some of which don't — this is itself substrate-quality feedback. The discipline says: investigate whether the artefact should be split into single-function components per ADR-016 (modular consolidation). Mechanical naming is the test of single-function discipline; ambiguous naming surfaces architectural debt as a design issue requiring attention.

This composes with ADR-016 (modular consolidation) and ADR-017 (named bus structures) elegantly. The naming convention's failure to mechanically produce a name is a feature, not a bug — it identifies artefacts that need rework before they accumulate further substrate cost.

Why this discipline matters. Without a deterministic naming convention, every new artefact requires Tim-as-oracle to adjudicate the category. This bottlenecks substrate growth at the rate of Tim's availability and makes the discipline only as strong as one person's continuous involvement. A team-of-three operation cannot tolerate that bottleneck, and a substrate that scales beyond three people definitely cannot.

The convention is deterministic so that Alex, future Claude, future collaborators, and even mechanical tooling can apply it without consulting Tim. Black is black, white is white, utility is utility. The cost of this determinism is articulating the convention sharply enough to remove judgment calls. That articulation is paid once, here. The benefit — substrate-naming that scales beyond Tim's involvement — is permanent.

This discipline matters because it removes a real ambiguity that has cost RTOpacks at multiple points in its history. The same word doing multiple architectural jobs is the failure mode this convention prevents — same principle that retired customer/org/account in favour of client (per client-spine.md Section 3). Names earn their meaning by being used consistently for one specific concept.

Cartography implication. Three categories means three visual treatments on the map. The map legend distinguishes utility nodes (no scope prefix), client-facing nodes (rto-*), and platform-owned nodes (rtopacks-*) via distinct visual cues (colour-coding is the natural default, but specific cartography styling is downstream cartography work). The absence of a prefix is the canonical marker in the artefact's name itself; the visual marker on the map is the supplementary affordance for at-a-glance reading.

Artefact-type suffix vocabulary. The artefact-type segment is drawn from a closed vocabulary:

  • module — a unit of code/functionality, internally cohesive, externally consumed
  • bus — a centralised flow-routing structure (per ADR-017)
  • wrapper — an internal abstraction over an external provider's full API (per ADR-015)
  • proxy — a contract-converter that translates between systems without exposing a full provider API (distinct from wrapper; does not mirror an upstream surface)
  • queue — a Cloudflare Queue or analogous queueing primitive (concrete artefact; may back a bus but is not itself the bus pattern)
  • db — a database (D1 in current substrate)
  • kv — a KV namespace
  • r2 — an R2 bucket
  • worker — a Cloudflare Worker (including cron-triggered workers; the cron is a property of the worker, not a separate artefact type)
  • surface — a customer-facing surface (apps/site, apps/admin, etc.)
  • connector — a binding or junction between two named artefacts (used when the connector itself has a name worth referencing on a map)

Vocabulary extensions are deliberate, not casual. If a substrate-real artefact does not fit cleanly into existing vocabulary, the extension is named explicitly with reasoning, captured in this ADR's history or a successor ADR.

Environment marker. Every named artefact carries an environment suffix marking which side of the Mandarin separation (per HARD SEPARATION RULE in standing-rules.md) it sits on.

  • -prod — live side of the Mandarin. Production substrate, real client data, real RTOpacks operations.
  • -dev — development side of the Mandarin. Development substrate, test data, no client-facing impact.
  • -staging — pre-production verification tier, where used.

Full-word environment tokens are canonical (prod, dev, staging), not single-letter abbreviations. Single letters are visually ambiguous when scanning names; full words are unambiguous because the environment-token vocabulary is a closed set.

The environment marker is a suffix because function-naming reads left-to-right unimpeded with environment arriving last as a qualifier. The Cloudflare-imposed character constraint (Worker and database names allow only alphanumeric characters and hyphens; underscores, slashes, dots, and tildes are disallowed) is met by all elements of this convention.

Worked examples.

Platform-owned (rtopacks-*):

  • rtopacks-tga-ingest-worker-prod — production worker that ingests TGA into the RTOpacks substrate. The function (ingestion) serves platform substrate needs; client benefit reached only via derived presentations through other artefacts.
  • rtopacks-tga-ingest-worker-dev — the dev-side equivalent.
  • rtopacks-sms-cellcast-wrapper-prod — production Cellcast wrapper (per ADR-014). Wraps external provider for platform-wide consumption; not itself client-facing.
  • rtopacks-admin-surface-prod — production admin surface for RTOpacks operators. Clients never interact with it.

Client-facing (rto-*):

  • rto-studio-storage-db-prod — production Studio storage database holding client work. The function (holding and returning Studio data) terminates in a client outcome.
  • rto-site-surface-prod — production client-facing site surface. The function (presenting the site) terminates in a client outcome.
  • rto-workspace-surface-prod — production client workspace at my.rtopacks.com.au. The function (hosting the client's workflow environment) terminates in a client outcome.
  • rto-radar-worker-prod — production Radar worker (RTO intelligence). The function reaches client outcomes directly through Radar's client-facing presentations.

Utility (no scope prefix):

  • internal-api-worker-prod — production internal API worker. The words "internal" and "api" together fully describe the function — a mediation API used internally. Ownership is architecturally irrelevant; consumed by platform-owned and client-facing artefacts alike.
  • webhook-inbound-bus-prod (when the bus pattern lands per ADR-017) — production inbound webhook bus. The name fully describes its function; routes inbound webhooks regardless of which side consumes them.
  • outbound-bus-prod (when the bus pattern lands) — production outbound bus. Same property — name self-describes; consumed by both sides.

The utility category is intentionally small. Most artefacts will land in either rtopacks-* or rto-* because most artefact names do not fully self-describe — they need ownership context to be understood. The utility category is reserved for the genuine substrate-utility cases where the name carries its own meaning unaided.

Current substrate alignment. The existing substrate uses rto-* and rtopacks-* inconsistently (databases mostly rto-*, workers mostly rtopacks-*, with no documented rule). Environment markers are mostly absent from current names; environments are separated by Cloudflare account or deployment configuration rather than by name. Adopting this discipline strictly will require renaming work.

The rename has asymmetric substrate cost. Verified per-resource-type rename feasibility against current Cloudflare documentation (2026-05-26, per CANON-VS-ADR-018-RECONCILIATION-01):

Resource type Renamable? Cost
Workers Yes (stable UUID across rename; routes need re-attachment) Cheap
KV namespaces Yes (wrangler kv namespace rename) Cheap
D1 databases No (no rename CLI/API) Create-new-and-migrate-data
R2 buckets No (bucket name in S3 endpoint URL) Create-new-and-copy-objects
Queues No ("queue name cannot be changed after creation") Create-new-and-migrate-consumers
Service bindings Yes (via Worker rename; UUID stable) Cheap

The naming-conformance brief (WORKER-AND-DB-NAMING-CONFORMANCE-01) reflects this asymmetry with phased sequencing:

  • Phase 1 (immediate when brief drips): Worker and KV renames. Both renamable cheaply; fast; high architectural-visibility value.
  • Phase 2 (opportunistic): D1 / R2 / Queue renames bundled into migration briefs that are otherwise running (OPS-DB split, engine-db-oc rename per ADR-008 cleanup, any future schema migrations or content migrations that touch resource identity).
  • Phase 3 (deliberate): Remaining D1 / R2 / Queue renames that lack a natural migration vehicle. Dedicated brief if and when the cost-benefit justifies; otherwise deferred indefinitely.

This sequencing respects substrate cost without weakening the discipline. The convention applies; the execution sequencing is pragmatic. The verification table is point-in-time; if Cloudflare adds rename APIs for additional resource types, Phase 1 expands accordingly. Substrate-specific rename details and CF documentation references live in infrastructure/cloudflare-naming-canon.md.

The convention applies to all new artefacts from this point forward immediately. Existing-substrate rename happens per the phased schedule.

The cartography v2 map labels nodes per the new discipline even where current substrate names differ. The map shows target-state names with current-state names noted alongside where they diverge, making the rename surface itself a piece of visible substrate work rather than hidden tech debt.

The convention is canonical from this point forward. Refinement happens against concrete examples as the cartography work proceeds; substantive changes to the convention require explicit ADR updates with named reasoning.

Living document. The map is maintained alongside the substrate. When new components are added, named, or relocated, the map updates in the same commit as the substrate change. This is enforced by discipline (the briefs that change substrate also update the map) rather than by tooling, though tooling support may be added later if discipline alone proves insufficient.

Consequences. - The map lives at a canonical path in the repo, rendered into docs.rtopacks.com.au as part of the docs site. Exact path is part of the prelim cartography work. - A first prelim cartography brief produces the v1 10,000-foot map of the substrate as it exists today, with working names per the proposed taxonomy. This v1 is refined into v2 (and beyond) as substrate work proceeds. - Every brief that touches substrate considers whether the map needs to update as part of the brief's scope. If yes, the map update is part of the brief deliverable. - New components are named at design time, with names following the convention. Names are not added after implementation as an afterthought. - The naming convention itself is a canonical commitment. If a component name does not fit the convention, either the component is renamed or the convention is extended deliberately (with named reasoning in this ADR's history or a successor ADR). - This ADR is what makes the in-and-out bus pattern (ADR-017) operationally visible. A bus that exists in code but is not on the map is invisible to anyone who hasn't read the code. A bus on the map is immediately visible to everyone. - This ADR is what makes the modular-consolidation principle (ADR-016) operationally enforceable. A module that exists in code but is not on the map will be reinvented by anyone who doesn't know it exists. A module on the map gets reused because it's visible.

This ADR closes the loop on the three-principle composition: modules produce nameable units (ADR-016); buses produce traceable flows (ADR-017); cartography makes both legible at a glance (ADR-018). A three-person operation can operate substrate of meaningful complexity if all three principles hold.

See client-spine.md Section 9, ADR-016 (modular consolidation), ADR-017 (bus pattern), the cartography artefact itself once it lands.


ADR-019 — Mandarin enforcement at the credential-surface layer

Status: Accepted Date: 2026-05-26

Context. External service integrations (per ADR-014 for Cellcast, future Stripe/QuickBooks/Twilio rationalisation, any future integration) bridge across the Mandarin separation to different external substrates per environment: sandbox-QuickBooks vs live-QuickBooks, Stripe test mode vs live mode, dev-API-keys vs prod-API-keys, and so on. The architectural question is where this asymmetry lives.

Two viable approaches:

  • Per-environment code branches. Wrapper code includes conditional logic like if env === 'prod' then live_endpoint else sandbox_endpoint. Asymmetry lives in the code path. Different code runs in prod vs dev.
  • Environment-neutral code with environment-specific credential bindings. Wrapper code is identical across both sides. Asymmetry lives entirely in what credentials and configuration get loaded at runtime. The same code, configured twice.

Per-environment code branches produce drift, surprises, and silent inconsistency at scale. They also make utility-shaped artefacts impossible — any consumer of an asymmetric-code wrapper inherits the asymmetry. The alternative — environment-neutral code with credential-layer separation — preserves code reuse, prevents behavioural drift, and allows downstream consumers to be utility-shaped per ADR-018.

Decision. The Mandarin separation for external service integrations is enforced at the credential-surface layer, not at the code-branch layer. Wrapper code (per ADR-015) is environment-neutral. Environment selection happens at credential and configuration loading time. The substrate enforces the boundary via Cloudflare-secret scoping; the admin surface enforces operator-side discipline; sole-operator awareness backs the architecture as secondary safety net.

The principle in plain words: the same wrapper code runs on both sides of the Mandarin, configured with different credentials. Like a machine gun chambered for two different bullet types: same mechanism, different ammunition, different outcomes. The bullet selection happens at loading time, via the admin surface, not at code-execution time.

The four enforcement layers, in order of strength.

  1. DNS-level enforcement. The dev and prod admin surfaces live on different top-level domains: admin.rtopacks.dev for dev, admin.rtopacks.com.au for prod. The operating system resolves these to different places. There is no subdomain-pattern that could be mistyped to cross the boundary; the TLDs are visibly and unmistakably different. This is the strongest visual broadcasting available.

  2. Substrate-level enforcement. Cloudflare Workers Secrets are scoped per environment. Dev workers have permission to read dev secrets; prod workers have permission to read prod secrets; cross-side access is structurally impossible because Cloudflare Access policies do not grant it. If an operator accidentally pastes a live token into the dev admin surface, the write either fails or writes to dev secrets where the prod wrapper cannot read it. The substrate enforces the boundary; the application layer trusts the substrate.

  3. Admin-surface enforcement. The admin surfaces are designed to broadcast which side they operate on through visual cues (banner colour, header treatment, persistent UI indicators) in addition to the URL itself. The admin surface is the operator's primary affordance for credential management; the surface's visible identity is the operator's primary signal of which side they are operating on.

  4. Operator discipline. Sole-operator awareness (Tim, with possible future Jimmy or other admin) of the consequences of misloading credentials. This is the weakest enforcement layer and is intentionally so: it is a secondary safety net, not the primary mechanism. The substrate-level enforcement makes accidental cross-contamination impossible; operator discipline catches the residual edge cases.

The admin surface as canonical external-integration cartography. The admin surface for each side is not merely a credential-management UI. It is the canonical view of which external services that side connects to, what the credential state is, when each was last rotated, and how to roll each one. Operators look at the dev admin to see what dev is connected to; they look at the prod admin to see what prod is connected to. The admin surface IS the human-readable reconciliation view of external integration state per environment.

This has implications for admin-surface design that go beyond ordinary credential management:

  • Per-environment inventory. The admin surface lists every external service that side connects to. New integrations appear when their wrapper is deployed; retired integrations disappear when their wrapper is removed. The list is substrate-derived, not maintained by hand.
  • State visibility. Current token, last-rotated timestamp, expiry (where known), operator who last rotated. Not deep telemetry; just the small set of facts operators need at a glance.
  • Rotation guidance. Each entry includes brief operator-actionable text: "to rotate, go to the Cellcast dashboard, regenerate the API key, paste it here." The guidance lives next to the credential it applies to, not in a separate runbook.
  • Operational frequency awareness. Dev tokens roll often (dev exposure happens in normal work); prod tokens roll rarely. The admin surface is optimised differently per side: dev admin is optimised for quick rotation, prod admin is optimised for rare-but-careful rotation with confirmation flows and audit-trail visibility.

Consequences.

  • Wrapper code (per ADR-015) is environment-neutral by construction. No if env === 'prod' branches anywhere in wrapper code. Provider endpoints, credentials, and behaviour are loaded from environment-specific configuration; the wrapper does not know which environment it is in.
  • Workers that consume environment-neutral wrappers can themselves be utility-shaped (per ADR-018 Q1b). The architectural path to utility runs through wrapper encapsulation of asymmetry.
  • The admin surfaces (rtopacks-admin-surface-prod and rtopacks-admin-surface-dev) are first-class enforcement infrastructure, not secondary tooling. ADMIN-AUTH-MODEL-RECONCILIATION-01 (or whichever brief touches admin-surface design) treats credential cartography as a primary concern, not as an afterthought.
  • The current substrate posture assumes sole-operator-with-substrate-backup. If the team grows to where multiple humans hold admin write access, this ADR is revisited with appropriate access-control workflows (role-based gating, change-approval workflows, four-eyes for sensitive rotations). Until then, the substrate-level enforcement plus operator-discipline-as-safety-net is the appropriate posture.
  • Audit trail of credential changes (which token loaded, by whom, into which side, when) is operational machinery that wants its own discipline brief at some future point. Not blocking for current operations; useful for future remediation if cross-contamination ever happens.
  • This ADR is what makes ADR-018's Q1b test architecturally achievable. Without credential-layer Mandarin enforcement, environment-neutral behaviour would be impossible and the utility category would collapse to artefacts that don't touch external substrates at all (a very small set). With this ADR, the utility category can be expanded by deliberate architectural work (wrapper-encapsulation of asymmetry).

See ADR-014 (Cellcast as canonical first wrapper), ADR-015 (full-API wrapper pattern), ADR-018 (naming convention, with Q1b grounded in this ADR's environment-neutrality property).


ADR-020 — The access-control model has three operating tiers

Status: Accepted Date: 2026-05-27

Context. RTOpacks needs a coherent access-control model that names who can do what, against whose substrate, with what authority. The current substrate has four unresolved identity conventions (per IDENTITY-MODEL-RATIONALISATION-01's scope) and no canonical access-control articulation. ADR-007 establishes user/credential separation at the schema level; ADR-008 establishes that there is no platform tier above clients; ADR-012 establishes signup-and-admin-authority decoupling. These three ADRs compose a partial answer but do not name the operating tiers explicitly. Without an explicit tier model, downstream identity rationalisation work has no target shape to rationalise toward, and the access-control behaviour accretes per-feature rather than expressing a coherent design.

The model needs to handle: customer-side administration and use (the RTOs themselves), operator-side support (the three-person team running the substrate), the variable shapes of customer organisations (one-person RTOs through to large multi-staff TAFEs), and future extensions (multi-administrator delegation, role differentiation among administrators).

Decision. RTOpacks operates three access-control tiers: an operator tier (T3), a client administrator tier (T4), and a client user tier (T4A). The numbering preserves continuity with the historical UCCA conceptual mapping from which the substrate inherits some of its lexical patterns; the numbers are not load-bearing but the structural distinctions are.

T3 — Operator tier. The three-person team operating the substrate (Tim, Alex, Claude). Holds super-user authority across all client substrates. Functions: substrate operation, infrastructure administration, customer support, incident response. T3 is the highest tier; there is no platform tier above it (per ADR-008). T3 does not have a commercial relationship to client substrates — operators are not paying customers of their own platform.

T4 — Client administrator tier. One or more users per client file holding administrative authority over that client's substrate. Functions: user invitation and management, role assignment among other users, plan and billing control, view of all client substrate state. T4 is established via the ADR-012 signup flow (asynchronous CEO-notification path for the first administrator; T4 administrators can subsequently invite further administrators, subject to the multi-administrator extension below). In the minimum-viable shape, one T4 administrator per client file. Schema and code are designed to accommodate multiple T4 administrators per client file without redesign — co-administration with role-differentiated administrators (Finance Admin, Education Admin, Compliance Admin, etc.) is designed-in but not built in initial implementation; the schema and code accommodate the extension when its case is real.

T4A — Client user tier. Users attached to a client file who do operational work within it but do not hold administrative authority over the client substrate. Functions: module use (Studio session work, People register maintenance, Radar interpretation, Record evidence handling, etc.), per-user state, per-user audit trail. T4A users are invited by T4 administrators; their permissions are a subset of what the T4 administrator grants. Different T4A users may have different permission sets within the same client substrate (a trainer's permissions differ from a compliance officer's permissions); permission shape lives in the module specs and is governed by ADR-009's deliberate-attachment discipline.

T4 and T4A are separable concepts, not separable humans. A T4 administrator is a role, not a user identity. The same human can hold both a T4 role and a T4A user identity simultaneously; in fact, this is the expected pattern. The T4 administrator role is the structural authority position; the T4A user identity is the countable seat that does operational work and against which plan entitlements consume. In a one-person RTO, the same human wears both hats — the architecture does not change, only the role-binding distribution does. Multi-person RTOs have one human in the T4 administrator role (or more, under the multi-administrator extension) and multiple humans in T4A user roles, with the T4 administrator typically also being one of the T4A users.

Consequences.

  • IDENTITY-MODEL-RATIONALISATION-01 now has a target shape to rationalise toward. The four current identity conventions resolve into: T4 administrator role-attachment, T4A user identity, with the substrate distinguishing role from identity at the schema level.
  • Module specs include their T4A permission model as a first-class section (composing with ADR-009's deliberate-attachment requirement). Each module names which T4A permission shapes it recognises and what each permits.
  • T4 administrators have implicit view-everything authority within their client substrate; explicit permission gates exist for actions (invite user, assign role, change plan, etc.) but not for read access within the client boundary.
  • T4A permissions are positive grants from the T4 administrator's authority pool — a T4 administrator cannot grant a T4A user a permission the T4 administrator does not hold. (This composes with the multi-administrator extension when it lands: a Finance Admin can grant finance-related T4A permissions but not compliance-related ones.)
  • Plan entitlements attach to client files (per ADR-021), not to individual T4A users. T4 administrators control plan; T4A users consume seats and resources within the plan envelope.
  • T3 operator access to client substrates is governed by ADR-022 (operator impersonation pattern), not by membership in the client substrate's access model. Operators do not appear in client user lists.
  • T4 administrator visibility into T4A user activity within their own client substrate is a customer-configurable capability, not an architectural commitment. The customer governs the employment-relationship ethics of internal observation through their own configuration; RTOpacks provides the technical capability without legislating how the customer exercises management authority over their own staff. Whether T4A users are notified of administrator observation actions is a customer choice, not an architectural default.
  • Cross-client access is structurally impossible. A T4 administrator of client A has no access whatsoever to client B's substrate. T3 operator access is the only path across client boundaries, and it is gated by the impersonation pattern in ADR-022.

See client-spine.md § 1, § 3, § 7, § 8; ADR-006; ADR-007; ADR-008; ADR-011; ADR-012; ADR-019; pending IDENTITY-MODEL-RATIONALISATION-01.


ADR-021 — Plan, entitlement, and metering are substrate concerns

Status: Accepted Date: 2026-05-27

Context. RTOpacks is commercially activated when paying customers attach to client files via subscription plans. The substrate needs to know about plans (what tier is this client on), entitlements (what does that tier permit), seats (how many T4A users does this client get), and metering (how much compute, ingestion, or other consumable resource has this client used relative to their plan envelope). Identity rationalisation that does not produce a plan-and-entitlement-aware structure produces something that has to be torn apart at commercial activation. The current identity model has no concept of plan attachment, seat-counting, or metering — these have to be designed into the substrate before commercial activation, not retrofitted afterward.

The pay-per-use compute model (similar in shape to LLM-platform tiering — a baseline plan tier includes a compute allocation; consumption beyond that triggers overage billing or plan upgrade) is the working assumption for RTOpacks' commercial model. Specific price-list questions are separate; the substrate shape needs to support this model whatever the prices turn out to be.

Decision. Plan, entitlement, and metering are substrate-level concerns with their own architectural commitments.

The plan attaches to the client file. Not to T4 administrators, not to T4A users. The client file is the commercial unit. A client has one plan at a time. T4 administrators control plan selection, upgrades, downgrades, and cancellation; plan state propagates to all T4A users attached to that client file.

Entitlements flow from plan to client to module access. A plan defines which modules the client has access to, what limits apply within those modules (seats, content generation counts, ingestion frequency, storage), and what overage behaviour kicks in when limits are reached. Module specs declare what they require from a plan to operate; the plan database is the authoritative source of "is this client entitled to this module at this level."

T4 administrator role does not consume a seat. The T4 administrator role is structural; the T4A user identity is countable. A plan with five T4A seats means five users can do operational work in that client substrate; the T4 administrator role exists independent of seat consumption. If the human who is the T4 administrator also wants to do operational work, they create a T4A user identity for themselves and consume one of the five seats. This separation is deliberate: it lets administrative oversight survive scenarios where all T4A seats are reassigned, paused, or in use.

Metering is substrate infrastructure, not a billing feature. A counter exists in substrate for each consumable resource (compute, ingestion runs, content generation calls, storage growth, others as they emerge). The counter accumulates per client. The plan defines the threshold at which overage kicks in. Below the threshold, consumption is invisible to the client. At and above the threshold, the substrate surfaces consumption state to the T4 administrator (with appropriate notice — see "consumption transparency" below) and routes overage to either pay-per-use billing or plan-upgrade prompts depending on plan configuration.

Consumption transparency. The T4 administrator can see consumption state for their client at any time. Approaching-threshold and overage states are surfaced through the admin surface, not buried in invoice details after the fact. The substrate does not surprise the customer with bills they could not have seen coming.

Consequences.

  • The substrate gains a plan database (or its functional equivalent) as a first-class entity, attached to client files. CREDENTIAL-PROVIDER-DECISION-01 and identity rationalisation work compose with this.
  • Module specs declare their plan-and-entitlement requirements explicitly. ADR-009's deliberate-attachment discipline extends to plan attachment: every module names what it requires from a plan to operate.
  • A pricing model exists separate from this ADR (which addresses architecture, not commercial pricing). The architectural shape supports whatever pricing model emerges; the pricing model can change without substrate redesign.
  • Audit trail spans plan changes. Plan upgrades, downgrades, lapses, resumptions, and overage events are all audit-trailed in the substrate. T4 administrators can see plan history for their client; T3 operators can see plan history across clients for support and finance purposes.
  • The plan/entitlement/metering substrate composes with ADR-023 (graceful degradation across subscription state). The substrate knows what the client had access to when subscription was active; that knowledge persists across lapse and is available on resumption.
  • Specific pricing decisions are deferred to commercial activation planning. This ADR is the architectural commitment to the model shape, not to the prices.

See client-spine.md § 1, § 3; ADR-008; ADR-009; ADR-010; ADR-011; ADR-020; ADR-023.


ADR-022 — Operator support uses the impersonation pattern

Status: Accepted Date: 2026-05-27

Context. T3 operators (the three-person team running RTOpacks) need to be able to enter T4 client substrates for support purposes — diagnosing customer-reported issues, observing what the customer is actually seeing, reproducing problems, validating that a fix has taken effect. The naive approach (log in as the customer using their credentials) is unacceptable on multiple grounds: it requires the customer to share credentials, it pollutes the customer's own audit trail with operator actions attributed to the customer, it creates a security surface where operator credential compromise becomes customer credential compromise, and it offers no clean exit pattern.

A separate operator-tier surface that views client substrate data without entering the client surface is insufficient — it does not let the operator see what the customer sees, which is often essential for diagnosing UX or workflow issues that are not visible in raw data.

The pattern needed has historical precedent in the UCCA architectural lineage (the "Ortho" pattern — colloquially described as monkey-down-the-ladder, monkey-up-the-ladder-and-close-the-hatch). The substantive properties of that pattern are worth canonicalising as RTOpacks' approach.

Decision. T3 operator support of T4 client substrates uses the operator impersonation pattern.

Mechanics. A T3 operator initiates an impersonation session targeting a specific T4 client substrate, optionally targeting a specific T4A user identity within that substrate. The substrate constructs a view-as-if-customer surface that mimics what the customer would see, with full read access to substrate state and limited (typically read-only by default; write capability gated by deliberate escalation) action capability. The operator's actions during the session are attributed to the operator-as-operator, not to the customer; the customer's own audit trail is not polluted by operator activity, and the operator's actions are visible in the operator-side audit trail with full context.

Session boundary. The operator enters the session deliberately, works within it, and exits deliberately. Entry is logged; exit is logged. The operator does not appear as a logged-in user in the customer's surface. The customer's experience is unaffected by the operator session — the customer is not notified, and the customer-facing surface does not surface operator activity.

Hatch closure. When the operator exits the session, the surface closes cleanly — no lingering operator access, no residual T3 authority in the customer's surface, no reverse-traversal possibility. The customer's surface is, after the operator exits, indistinguishable from what it would have been if the operator had not entered.

Comprehensive operator-side audit logging. Every T3 access to a T4 client substrate is logged with reason, actor, duration, and scope. The logs exist for three operator-side purposes: maintaining operator integrity (machines cannot keep machines honest; the audit trail enables human review of operator activity, including by external third parties if RTOpacks is ever called to demonstrate operator conduct), preserving evidence (if RTOpacks is ever called to demonstrate what happened in a customer substrate, the logs are the substantive record of actual events rather than recollection), and operational diagnostics (repeated operator access to the same client substrate for the same shape of issue surfaces operational hotspots worth addressing systemically).

The logs are operator-side artefacts. They are not part of the customer-facing surface. The architectural commitment is that operator activity is comprehensively logged at the moment of access, not that the customer is notified of operator activity. The substantive protection against covert misuse is the existence of the audit trail and the discipline of reviewing it, not customer-facing real-time visibility.

The architectural prohibition is operator activity that bypasses logging, not operator activity that is unnotified to the customer. Routine operator presence in customer substrates is a legitimate part of operating the platform. The customer signed up for software that requires operators to function. Treating routine operator access as a thing requiring per-incident customer notification would be performative — it would generate friction on legitimate operations without preventing the harms it claims to prevent. The protection that matters is structural: every operator action is captured at the moment of access, with reason, in logs the operator cannot bypass.

Legitimate future scenarios where customer-facing notification becomes a configurable overlay (enterprise compliance requirements, regulatory mandates, customer-requested transparency features) are addressed by deliberate ADR articulation at that point, not by ad-hoc backdoor implementation. The default architecture is comprehensive operator-side logging without customer-facing notification overlay; configurable extensions exist as deliberate additions per case.

Consequences.

  • A canonical impersonation-session substrate primitive exists, used by operator support and by no other code path. Authentication into the impersonation primitive is via T3 operator credentials and substrate-internal authorisation; customer credentials are never used.
  • Operator activity is comprehensively logged at the substrate level. The logs are operator-side artefacts (not surfaced to T4 administrators in the customer-facing admin surface) and serve operator integrity, evidentiary preservation, and operational diagnostics. They are reviewable by Tim, by Alex when authorised, by external third parties if legal or regulatory review is ever required.
  • The pattern composes with ADR-019 (Mandarin enforcement). Operator impersonation operates on the prod side of the Mandarin for prod customer support; on the dev side for dev work. There is no cross-Mandarin operator access; a prod-side incident cannot be diagnosed by impersonating a dev session.
  • The pattern composes with the disposition articulated in client-spine.md § 1: operator access to customer substrates is governed by traceability and integrity discipline, not by performative customer notification. Operator capability exists for legitimate operation; covert misuse is structurally prevented by logs that operator activity cannot bypass; routine operations are not theatrical notification events.
  • Future extensions (e.g. multi-operator co-session for complex debugging, read-only versus write-permitted session modes, customer-facing notification overlays where legitimate cases emerge) are extensions of the impersonation primitive, articulated as deliberate ADR additions when their case is real.

See client-spine.md § 1; ADR-008; ADR-019; ADR-020.


ADR-023 — Graceful degradation across subscription state

Status: Accepted Date: 2026-05-27

Context. Customers' commercial relationship to RTOpacks varies over time — they subscribe, they may lapse (cancellation, payment failure, deliberate pause), they may resume. The naive subscription-lapse handling pattern in SaaS is hard lockout: subscription ends, the customer cannot log in, their state is invisible to them, returning to the product requires re-onboarding from a position of "I had something and now it's gone." This pattern weaponises the customer's dependence on the substrate and produces measurable retention loss because customers who could not return easily simply do not.

The no-weaponised-lock-in principle articulated in client-spine.md § 1 names this as an architectural failure mode RTOpacks is structurally opposed to. The substrate is sentinel, not jailer. The disposition matters in itself and produces a commercial property: customers who can return easily often do, and customers who experience a soft door rather than a hard wall are measurably more likely to re-subscribe than customers who experience punishment for lapsing.

The architectural shape needed: subscription state changes affect what the customer can do, not what the substrate holds about them.

Decision. RTOpacks supports graceful degradation across subscription state. The substrate retains client state across subscription lapses; the T4 administrator role survives lapse; T4A users deactivate-not-delete; resumption is frictionless.

T4 administrator survives lapse. When a client subscription lapses, the T4 administrator role remains intact. The T4 administrator can log in, see what was there, see consumption state at time of lapse, see what changed about RTOpacks while they were away, and decide whether to resume service. The administrative surface during lapsed state is intentionally different from the active-subscription surface — it shows the preserved state, presents the resumption path, and does not gate the administrator from understanding what they previously had.

T4A users deactivate, not delete. When subscription lapses, T4A users transition to a deactivated state. They cannot log in; they cannot perform actions; the seats they occupied are not consumed. Their identity, their permission assignments, their action history, and their attachment to the client substrate all persist in the substrate. On resumption, T4A users return to their previous state — same permissions, same identity, same history — and resume work without re-onboarding friction.

Substrate state persists. Studio sessions, generated content, audit trails, plan history, consumption records, module-specific state — all of this persists in the substrate across subscription lapses. Lapse does not trigger data deletion. The customer's work is theirs; the substrate continues to hold it whether the customer is currently paying for active access or not. (Long-term retention beyond reasonable bounds is a separate operational question; this ADR addresses the architectural commitment, not unbounded storage at platform cost.)

Resumption is frictionless. A T4 administrator resuming subscription does not re-onboard. They restore plan selection (which may include reviewing current plan tiers if pricing has changed in the meantime), confirm payment, and their substrate is immediately active again with all T4A users, all permissions, all state, all history intact. Resumption is one decision, not a re-signup.

Audit trail spans state changes. Subscription lapse, resumption, deactivation, reactivation are all audit-trailed in the substrate. The T4 administrator can see "the client was lapsed from date X to date Y, resumed by user Z." T3 operators can see lapse-and-resumption history for support and finance purposes.

Consequences.

  • The substrate has explicit deactivated-not-deleted user states. T4A user identities have a lifecycle that includes deactivation as a non-terminal state.
  • T4 administrator surfaces have a designed lapsed-state mode, not just an active-state mode and an error-state mode. The lapsed state is a first-class experience, not an exception handler.
  • Plan substrate (per ADR-021) retains plan history per client across lapses. On resumption, plan history is intact; consumption counters reset or carry forward per plan-specific policy (a question separate from this ADR).
  • Module specs that hold per-user state include deactivation handling. When a T4A user deactivates, their per-user module state persists, and when they reactivate, it is intact. Modules do not have to handle "user has been recreated from scratch"; they handle "user has been dormant."
  • The export-portable output capability (ADR-010) compounds with graceful degradation. Customers who choose to leave permanently can take their work with them; customers who lapse temporarily come back to it intact. Both paths are first-class.
  • The no-weaponised-lock-in principle articulated in client-spine.md § 1 is the substantive reason for this ADR. The architectural choices below — administrator survives, users deactivate-not-delete, state persists, resumption is frictionless — each express that principle structurally.

See client-spine.md § 1; ADR-010; ADR-020; ADR-021.


ADR-024 — Canonical user-identity schema model

Status: Accepted Date: 2026-05-27

Context. The substrate accreted four key conventions (email-keyed / UUID-keyed / prefix-keyed / randomblob-keyed) across two identity-bearing D1 databases (rto-ops-db + rto-workspace-db), with cross-DB duplicate tables, six parallel L3-truth source mechanisms with no synchronisation, an L1-L4 UCCA-lineage tier system that does not match ADR-020's T3/T4/T4A vocabulary, and a tenants table that ADR-008 retired but three tables still reference. IDENTITY-MODEL-RATIONALISATION-01 Phase 1 audit (filed at ops/audits/IDENTITY-SURFACE-AUDIT-01.md) documented this state empirically.

ADR-007 (user/credential separation), ADR-020 (T3/T4/T4A access-control), ADR-021 (plan/entitlement/metering), ADR-022 (operator impersonation), and ADR-023 (graceful degradation) provide architectural principles but do not specify the schema-level shape that honours them. This ADR canonicalises the schema. The full design lives at ops/designs/IDENTITY-MODEL-CANONICAL-01.md; this ADR captures the load-bearing architectural commitments.

Decision. The canonical user-identity schema is three tables plus one auxiliary table plus two reshaped existing tables.

The three core tables.

  • users — UUID-keyed (RFC 4122). One row per human identity. Carries email UNIQUE, display_name, email_verified, client_id FK (NULL for T3 operators; required for T4/T4A), status enum (active/deactivated per ADR-023), INTEGER unixepoch() timestamps. Email change is an UPDATE, not a new row.
  • tier_grants — T3/T4/T4A role attachment. Single canonical source of "what tier is this user?" Replaces six parallel L3-truth sources (CF Access JWT as authority, operator_roles, access_allowlist as tier grant, user_tenant_roles, admin_sessions.tier hardcoded, orgs.billing_tier='internal' shortcut). UNIQUE(user_id, tier, client_id) permits T4 + T4A on same user_id + client_id (the ADR-020 separability between role and seat); permits multiple T4 grants on same client_id (the ADR-020 multi-administrator extension designed-in).
  • credentials — provider-opaque reference per ADR-007 separation. provider TEXT column accommodates CREDENTIAL-PROVIDER-DECISION-01's eventual provider mix without schema migration. metadata TEXT (JSON) carries provider-specific shape (see EXT-API convention below).

The auxiliary table.

  • magic_link_allowlist — renamed and semantically narrowed from access_allowlist. Gates magic-link issuance only; does not confer tier.

Two reshaped existing tables.

  • impersonation_tokens (ADR-022 implementation, already present) — tenant_id renamed to client_id per ADR-008; target_role narrowed to target_tier CHECK enum (T4/T4A only — impersonating another T3 is meaningless); new reason column per ADR-022 audit-trail discipline.
  • portal_invitestenant_id renamed to client_id. Otherwise unchanged.

Issuance gates vs tier grants — architectural distinction.

Two stages of the auth flow are now distinct first-class concerns:

  • Issuance gates — pre-authentication. Determine whether a credential request is honoured at all. Example: magic_link_allowlist decides whether to mint a magic-link token for a requested email. Membership in an issuance gate does NOT confer any tier.
  • Tier grants — post-authentication. Determine what authority an authenticated user holds. Example: tier_grants(tier='T3') declares operator authority for a successfully-authenticated user.

The substrate previously conflated these (per the audit's six L3-truth source finding — access_allowlist membership was implicitly an L3 grant). The canonical model separates them. Future briefs that propose new identity-adjacent tables answer explicitly: is this an issuance gate, a tier grant, or something else? Naming and placement follow that classification.

Provider-specific metadata documented in EXT-API reference docs.

The credentials.metadata TEXT column is opaque JSON to keep the schema provider-agnostic. Provider-specific shapes (passkey aaguid + public_key + sign_count; magic-link expiry + ip; CF Access policy id + auth_method) are documented in the EXT-API reference doc for each provider per the EXT-API RULE in standing-rules. This couples the credential schema to the existing EXT-API discipline rather than creating a parallel documentation surface.

Key convention. UUID-keyed (RFC 4122, 36-char hyphenated) is canonical. Generated via crypto.randomUUID() at signup. Reasons against the three alternatives are documented in the design doc (ops/designs/IDENTITY-MODEL-CANONICAL-01.md § 2.2).

Tier vocabulary. L1-L4 retires; T3/T4/T4A canonical. L1 + L2 enum values have no equivalent (T3 is the highest tier per ADR-020). The zero-UUID admin's L1 row migrates to T3.

Zero-UUID admin disposition. The row 00000000-0000-0000-0000-000000000001 is preserved. The UUID is valid RFC 4122; nothing structurally privileges its all-zeros-except-last-octet pattern. The historical "zero-UUID = platform anchor" convention survives as substrate observation, not runtime invariant. Code paths that depend on the literal value migrate to tier_grants(tier='T3') lookup.

Consequences.

  • Six L3-truth sources collapse to one (tier_grants). CF Access remains the DNS-level authentication gate per ADR-019 but is no longer also a tier authority.
  • Per-row migration strategy (not per-table) — the audit found mixed conventions within identity surfaces; each row normalises independently.
  • Eight tables retire across both DBs: operator_roles, user_tenant_roles, admin_sessions, tenants, both copies of magic_tokens, both copies of passkey_credentials (migrated to credentials), both copies of users (consolidated). Plus the five ADR-008 orphans via the separate OPS-DB-IDENTITY-ORPHAN-CLEANUP-01 brief.
  • Two cross-DB duplicate sub-briefs subsume into this design's migration (users + passkey_credentials). Two remain standalone (products + magic_tokens — the new fourth duplicate not previously filed).
  • DB placement deferred to OPS-DB-SPLIT-SHAPE-DECISION-01. The schema is portable across three placement candidates (workspace-db / ops-db / new identity-dedicated DB).
  • ADMIN-AUTH-MODEL-RECONCILIATION-01 composes cleanly — the credentials table accommodates the three possible outcomes for passkey scaffolding (preserve / wire verify / drop).
  • CREDENTIAL-PROVIDER-DECISION-01 composes cleanly — new providers land as new provider values without schema migration.
  • The session shape (apps/workspace/lib/session-types.ts) replaces ucca_layer: 1|2|3|4|4.5 with tier: 'T3'|'T4'|'T4A' + client_id. Code-path update is Phase 3 migration scope.
  • Phase 3 migration is a separate brief (IDENTITY-MODEL-MIGRATION-01). The schema commitment lands here; execution lands there.
  • Cross-DB FK references in this schema's client_id columns are documentation-only per ADR-026. The IMM-01 Phase 2 substrate-reality finding established that D1 cross-DB FK references parse at CREATE TABLE but fail at INSERT time when the referenced table doesn't exist in-DB. The canonical schema at scripts/identity-schema.sql documents the client_idclients(id) relationship as a comment, with referential integrity an application-layer responsibility.

See client-spine.md § 3, § 7, § 8; ADR-007; ADR-008; ADR-012; ADR-019; ADR-020; ADR-021; ADR-022; ADR-023; ADR-026; EXT-API RULE in standing-rules.md; full design at ops/designs/IDENTITY-MODEL-CANONICAL-01.md; audit at ops/audits/IDENTITY-SURFACE-AUDIT-01.md.


ADR-025 — Identity model placement: dedicated identity-db

Status: Accepted Date: 2026-05-27

Context. ADR-024 canonicalised the user-identity schema (three-table model: users + tier_grants + credentials, plus magic_link_allowlist and reshaped impersonation_tokens + portal_invites) but explicitly deferred database placement. The IDENTITY-MODEL-PLACEMENT-DECISION-01 brief evaluated three placement options against substrate state and architectural fitness:

  • Option A — Identity in rto-workspace-db (substrate-state preservation; workspace-db is canonical-by-content today)
  • Option B — Identity in rto-ops-db (substrate-state reversal honouring MANDARIN intent; operator concerns in ops-db)
  • Option C — Identity in new rto-identity-db (first-class identity subsystem)

The decision analysis at ops/decisions/IDENTITY-MODEL-PLACEMENT-ANALYSIS-01.md captured the full trade-off. Substrate-scan corrected the initial cost framing: Option C's marginal cost over Option B is ~1 hour additional substrate work (new D1 provisioning + schema creation), not "substantially heavier" as initial framing suggested. Customer-facing workers cannot bind ops-db directly per MANDARIN, so both Options B and C require identity reads to route through internal-api service-binding — identical caller-side cost across the two options.

D1 database names are immutable (CF substrate constraint, confirmed). This ruled out shortcuts like "rename workspace-db to identity-db"; both Options B and C require real substrate work and honour the name-matches-shape principle. Option A (identity-in-workspace-db) was disfavoured because the database called rto-workspace-db carrying both workspace state AND identity state perpetuates a name-doesn't-match-shape shortcut — the kind of residue that compounds across migrations.

Decision. Identity tables live in a new dedicated database: rto-identity-db (with rto-identity-db-staging twin per Peel taxonomy).

Identity-as-first-class-subsystem. This decision establishes identity as its own architectural concern with its own database, its own service mediation boundary (via internal-api), and its own lifecycle. The framing composes forward with: InstaLearn credential issuance (the credential-issuance surface composes naturally with a dedicated identity subsystem); federation per spine §7 (external IDP integration has a clean integration boundary); future RTO SSO integration (same — clean federation surface).

Classification per MANDARIN DATA TAXONOMY. rto-identity-db is Peel category — per-env, app-writable, schema-locked, with the -staging twin per the Peel pattern. Identity data is operational state that twins between environments, not reference data (Pith) or sync-output (Sync-output) or public intake (Intake).

Service mediation via internal-api. internal-api gains a binding to rto-identity-db and hosts identity-read endpoints. No new worker is provisioned — internal-api already has dispatch-by-pathname routing, OPS_DB + WORKSPACE_DB bindings, and identity-adjacent surface (passkey.ts WebAuthn handling). Adding identity endpoints is a config + endpoint addition, not new infrastructure. This composes with EXTERNAL-WRAPPER-CONFORMANCE-01's framing of internal-api as canonical service mediation layer.

Customer-facing workers route via service-binding. Per MANDARIN, customer-facing workers (apps/site, apps/workspace, workers/prelaunch) cannot bind rto-identity-db directly any more than they can bind rto-ops-db. The 18 identity-read sites surfaced by substrate scan migrate from direct workspace-db queries to service-bound calls through INTERNAL_API (the service binding is already configured in apps/site and apps/workspace wrangler configs).

Consequences.

  • New D1 databases provisioned: rto-identity-db (prod) and rto-identity-db-staging. Schema per ADR-024 design lands at provisioning time.
  • internal-api gains IDENTITY_DB binding to rto-identity-db (plus staging counterpart). New identity-read endpoints added to internal-api's pathname dispatch surface (~6-10 endpoints per the analysis): user-by-email, user-by-id, tier lookup, credentials list, portal-invite lookup, member-list, etc.
  • Customer-facing identity reads route via existing INTERNAL_API service binding. Caller-side conversion: ~18 sites across apps/site + apps/workspace + workers/prelaunch (~54-90 LOC). Internal-api side: ~180-500 LOC.
  • Identity is now first-class architecture. Future briefs cite "the identity subsystem" rather than treating identity as an attribute of workspace or ops. InstaLearn, federation, RTO SSO, and future identity-touching work compose against this subsystem.
  • OPS-DB-SPLIT-SHAPE-DECISION-01 is now identity-out-of-scope. That brief, when it drips, decides the ops-db remaining-tables shape (orgs, partner_accounts, billing_*, etc.) only. Identity is removed from its scope. The three shapes (4-cluster / 2-DB / 3-DB-narrative) still apply to non-identity ops-db content; this decision constrains none of them.
  • IDENTITY-MODEL-MIGRATION-01 (Phase 3 of IDENTITY-MODEL-RATIONALISATION-01) executes the migration against rto-identity-db as target. Migration brief structure: Phase 1 schema creation + binding wiring; Phase 2 per-row migration with id_migration_map; Phase 3 code-path update (caller-side + internal-api endpoints); Phase 4 explicit retirement of superseded tables (per candidate MIGRATION-COMPLETION-DISCIPLINE).
  • The cross-DB duplicate sub-briefs (CROSS-DB-DUPLICATE-USERS-01, CROSS-DB-DUPLICATE-PASSKEY-CREDENTIALS-01, CROSS-DB-DUPLICATE-MAGIC-TOKENS-01) inherit rto-identity-db as consolidation target. The fourth duplicate (products) remains out of identity scope.
  • Reversibility: choosing C and later wanting B means dropping rto-identity-db and re-migrating to rto-ops-db (~3-4 days). Choosing C and staying with C means no additional cost. The forward asymmetry favours C as the lower-regret position.
  • The Peel taxonomy admits the -staging twin naturally; no new taxonomy category required. CANONICAL-PROJECT-FILES-CURRENCY discipline applies — when canonical docs change, snapshot refreshes.
  • CF Access posture. No new CF Access policy is required for rto-identity-db. Per ADR-019, CF Access enforcement is at the worker-domain layer (admin.rtopacks.com.au, etc.), not the D1 layer. Identity-bearing data is protected by virtue of customer-facing workers routing reads through internal-api service-binding rather than through any CF Access policy applied to the D1 itself. If a dedicated identity-management admin surface is later proposed, that surface would require new CF Access policy treatment at the worker level.

Methodology observations. This decision is the fourth application within a 36-hour window of BRIEF-DRAFT-SUBSTRATE-VERIFICATION (candidate discipline, promotion-ready) — the substrate scan corrected initial architect-intuition on Option C's cost from "substantially heavier" to "~1 hour marginal." Without the discipline, this decision would likely have landed on Option B against an incorrect cost framing. The Phase 4 retirement scope reference (per MIGRATION-COMPLETION-DISCIPLINE candidate, Tim-filed 2026-05-27) makes explicit retirement a deliverable of the downstream migration brief rather than a "maybe later" item.

See ops/decisions/IDENTITY-MODEL-PLACEMENT-ANALYSIS-01.md for the full option analysis; ADR-024 for the schema model this decision places; MANDARIN DATA TAXONOMY in standing-rules.md for the Peel category; ADR-019 for the credential-surface Mandarin enforcement that composes with this placement; spine § 7 for federation forward-positioning.


ADR-026 — Cross-database relational constraints are application-layer concerns

Status: Accepted Date: 2026-05-27

Context. The canonical user-identity schema designed in ADR-024 placed client_id columns on users, tier_grants, impersonation_tokens, and portal_invites with REFERENCES clients(id) foreign-key clauses. ADR-025 placed the identity schema in rto-identity-db while clients lives in rto-workspace-db (or wherever client-data substrate eventually consolidates). IDENTITY-MODEL-MIGRATION-01 Phase 2 surfaced an execution-time failure: the canonical schema applied successfully to rto-identity-db at Phase 1a/1b (CREATE TABLE accepts cross-DB FK syntax without resolving the reference), but the first INSERT against users failed with SQLITE_ERROR: no such table: main.clients. SQLite enforces FK references at INSERT time; D1 databases are isolated, so main.clients is not resolvable from inside rto-identity-db.

This is a substrate-reality finding the canonical-design pass missed: REFERENCES syntax to a cross-DB table parses but does not enforce. The shape of the canonical artefact disagreed with the shape of the substrate, and the disagreement only surfaced at execution. The MANDARIN DATA TAXONOMY (per ADR-018) ratifies cross-DB relationships as the architectural norm (customer-facing workers cannot bind operator/identity DBs directly; reads route through service mediation), so the substrate constraint is not occasional — it is the default condition for relational shape across the RTOpacks substrate.

PRAGMA foreign_keys = OFF does not persist across statements on D1 (each query is a fresh connection), so per-session FK toggling is not a workaround.

Decision. Cross-database relational constraints in RTOpacks are application-layer concerns. The canonical-schema convention is:

  1. In-DB foreign keys are first-class. REFERENCES clauses to tables within the same D1 database are used normally. SQLite enforces them at INSERT time and they earn their place as a substrate-level integrity guarantee.

  2. Cross-DB logical relationships are documented as schema comments. Where a column logically references a table in another D1, the canonical schema documents the relationship as a comment on the column rather than a REFERENCES clause. Convention:

-- client_id logically references clients(id) in client-data substrate;
-- D1 cross-DB FKs not enforceable, validity is application-layer responsibility
client_id          TEXT,
  1. Application-layer is responsible for cross-DB referential integrity. Validity checks (does client_id exist in clients?), cascade semantics (when a clients row deactivates, what happens to its users / tier_grants?), and orchestration (cross-DB joins, transactions, deletions) live in worker code, not in the substrate.

  2. Cross-DB joins, transactions, and cascading deletes also cannot span DBs. A worker that needs identity-and-client data either (a) reads both DBs and joins in-app, or (b) routes the read through a service binding that owns both DBs (internal-api is the canonical mediator for identity-data + client-data joins per ADR-025).

  3. MANDARIN-driven separation makes cross-DB relationships the norm. This is not an edge case to design around. Customer-facing workers, internal-api, and ops surfaces operate against different DBs by design (per ADR-018 MANDARIN taxonomy + ADR-019 credential-surface enforcement). Cross-DB-FK-as-comment is the standing convention across all canonical schemas going forward, not a one-off for identity tables.

Consequences.

  • ADR-024 schemas adjusted. The users, tier_grants, impersonation_tokens, and portal_invites tables in rto-identity-db lost their REFERENCES clients(id) clauses. The client_id column shape (TEXT, nullable for T3 grants, NOT NULL for T4/T4A/portal_invites/impersonation_tokens) is preserved. The logical relationship is documented per the convention above. ADR-024 Consequences amended with a forward-pointer to ADR-026.
  • In-DB FKs survive. tier_grants.user_id REFERENCES users(id), tier_grants.granted_by REFERENCES users(id), credentials.user_id REFERENCES users(id), impersonation_tokens.actor_id + target_user_id, portal_invites.invited_by, magic_link_allowlist.added_by all reference users within the same DB and remain enforced.
  • All future canonical schemas adopt this convention. When designing new tables that logically reference tables in other DBs (e.g. plan/entitlement/metering tables per ADR-021 referencing clients across DBs; future federation tables referencing external IDP records; any worker-DB referencing identity records), the cross-DB relationship is documented in comments. The REFERENCES keyword is reserved for in-DB integrity.
  • Application-layer FK enforcement becomes a brief-able pattern. Where cross-DB referential integrity matters (e.g. preventing a tier_grant for a non-existent client_id), the discipline is implemented in the issuing worker, not in the substrate. Future briefs may codify the validation pattern (e.g. internal-api endpoints that own multiple DB bindings can validate at write time).
  • Joins, transactions, and cascading deletes cannot span DBs. When a multi-DB read is required, the canonical path is a service binding to a worker that holds both bindings (internal-api is the canonical multi-DB mediator). When cascading deletes are required across DBs, the cascade is implemented as a series of writes through the mediating worker, not via DB-level ON DELETE CASCADE.
  • Schema-doc-comment convention earns its place as a canonical artefact convention. ADR-018's documentation-discipline becomes ADR-026-aware: canonical schemas don't omit cross-DB relationships, they document them as comments. This makes the multi-DB substrate's relational shape visible at schema-read time even when the substrate cannot enforce it.
  • The IMM-01 Phase 2 schema-fix-sequence (rebuild the four affected tables in rto-identity-db + rto-identity-db-staging without REFERENCES clauses) is a one-shot remediation for the pre-ADR-026 schema. The forward-canonical schema at scripts/identity-schema.sql adopts the new convention from the start; future identity-db re-creations apply the corrected schema natively.
  • Validate-load-bearing-assumptions standing rule reinforced. ADR-024's REFERENCES clauses were a design assumption that the canonical schema would behave like a single-DB schema. The assumption was not validated against the D1 substrate at design time. The finding became visible only at first INSERT, after Phase 1a/1b schema-creation had already succeeded (creating an asymmetric failure mode: schema looks correct, breaks at first use). The discipline cure: when a canonical schema references tables in other DBs, the validate-load-bearing-assumptions step is "test an INSERT, not just a CREATE TABLE."

Methodology observations. This ADR captures the twelfth in-session application of BRIEF-DRAFT-SUBSTRATE-VERIFICATION (formally codified at STANDING-RULES-PROMOTION-02 close earlier today) — substrate caught what canonical text didn't notice: that REFERENCES syntax parses but doesn't enforce across DBs. It is the first in-session canonical-text-vs-substrate-constraint mismatch caught at execution time rather than draft time. The methodology worked, but at higher cost than draft-time catch would have been; a candidate IMM-01 close-report observation is whether earlier verification (e.g. INSERT-one-test-row at Phase 1a) would have caught it. validate-load-bearing-assumptions (from the CLAUDE.md root rules) is confirmed as a load-bearing methodology pattern — earned its place during VERSION-UPGRADE-01 in May and continues to earn applications; promotion to memory-pinned candidate worth tracking.

See ADR-018 MANDARIN DATA TAXONOMY; ADR-019 credential-surface enforcement; ADR-024 (Consequences amended with forward-pointer to this ADR); ADR-025 identity-db placement that surfaced this constraint; scripts/identity-schema.sql for the corrected canonical schema; standing rule validate-load-bearing-assumptions in CLAUDE.md and BRIEF-DRAFT-SUBSTRATE-VERIFICATION in standing-rules.md.


ADR-027 — Environment parity is a canonical commitment

Status: Accepted Date: 2026-05-27

Context. RTOpacks operates a two-stage deployment workflow (dev → prod), not the three-stage convention (dev → staging → prod) that wrangler config templates default to. Until ENVIRONMENT-NAME-RENAME-01 (2026-05-27 PM), the wrangler env.staging block was the colloquial "dev" environment served at *.rtopacks.dev — an infrastructure-convention name inherited at config-template time that did not match operational reality. The mismatch produced semantic collision with the unrelated staging.rtopacks.com.au pre-launch artefact and required constant mental translation in briefs, deploy instructions, and substrate discussions.

Additionally, the MANDARIN principle has protected database isolation within an environment (Pith, Sync-output, Peel, Intake categories) but environment parity across environments has been honoured informally rather than canonically. The IMM-01 (IDENTITY-MODEL-MIGRATION-01) work surfaced this: Phase 1a/1b/2/3a/3b were executed against prod substrate without explicit environment-scope enumeration, leaving dev in pre-IMM-01 state by default. The asymmetry would have compounded across every subsequent brief until corrected.

The illuminated-fence framing (Tim-articulated 2026-05-27 PM) names the operational economics: when both halves of substrate look identical at every layer the operator controls, debugging, migration sequencing, and propagation questions all simplify — "where does the haystack live?" answers itself. Without environment parity, every operational question carries an implicit translation tax.

Pre-revenue is the right moment to pay the architectural cost. There are no clients to break, no team to coordinate, no revenue to interrupt. The cost of getting environment parity right now is hours; the cost of getting it right after first client onboarding is days plus customer impact.

Decision. Environment parity is canonically committed at the schema layer and at the operational-vocabulary layer. Data parity is not committed.

Operational vocabulary is dev/prod. RTOpacks has two environments: production (top-level wrangler config, *.rtopacks.com.au domains) and dev (env.dev block, *.rtopacks.dev domains). The wrangler env name, deploy scripts, source code env-checks, KV cache prefixes, living docs, and brief framings all use dev and production. Infrastructure-convention naming (staging) was operationally misaligned with the two-stage workflow and has been retired in source per ENVIRONMENT-NAME-RENAME-01.

Schema parity is canonically committed. Each environment holds structurally-identical substrate: same canonical schemas applied to both environments before downstream code work proceeds. The Peel taxonomy provides the intra-env twin pattern (prod has prod-staging; dev does not require a dev-staging — dev itself is the pre-prod test environment). Read-only reference data shared across environments (rto-nrt-db, rto-abs-db, rto-licensing-db) remains shared per MANDARIN; the parity principle applies to environment-twinned data only.

Data parity is NOT committed. Substrate-honestly different data per environment is expected. Per CANONICAL-IDENTITY-VIA-UI-ONLY (standing-rules.md, canonical-work cluster), dev substrate populates via UI paths against dev — not by mirroring production data. Dev's rto-identity-db-staging may be empty when prod's rto-identity-db holds rows; that is correct, not a regression. Committing to data parity would over-constrain the architecture and conflict with the UI-only entry principle.

Migration briefs explicitly enumerate environment scope. Brief framings going forward state "Environment scope: production + dev" or "Environment scope: production only" with justification. Implicit single-environment scope is a brief-framing failure caught at Gate 1 substrate-state spot-check.

Substrate-name lag is documented temporary state. D1 database names retain -staging suffix per ADR-018 immutability cost (D1 names cannot be renamed; create + migrate + retire is the only path). KV namespace SESSION_KV_STAGING and substrate-baked ID prefixes (ten_staging_, usr_staging_) inherit the same immutability class. Alignment of these substrate names to operational vocabulary is planned in D1-NAME-ENVIRONMENT-RENAME-01 (queued forward brief, sequencing TBD post-IMM-01 close, possibly bundled with other D1-name historical-artefact cleanup such as the engine-db-oc display-name fix). Until that brief lands, the asymmetry is canonically documented — operational vocabulary aligns with operator mental model; D1 substrate names lag with explicit forward-pointer.

The staging.rtopacks.com.au URL is unrelated. A separate concept entirely: the pre-launch apex artefact (production substrate behind a prelaunch landing). It retires on its own timeline when the prelaunch landing comes down and staging.rtopacks.com.au becomes www.rtopacks.com.au or just rtopacks.com.au. ADR-027 does not touch this URL or its eventual retirement; the env-name collision was self-limiting in time and has been resolved upstream by renaming env.staging to env.dev.

Consequences.

  • All four customer-facing CF workers and internal-api retired their -staging names and stood up under -dev names during the IMM-01 close arc. New worker names: rtopacks-internal-api-dev, rtopacks-admin-dev, rtopacks-workspace-dev, rtopacks-site-dev. Service-binding cross-references updated in lockstep.
  • Source code env-checks (env.ENV === "dev") and KV cache prefixes (dev:) replaced their staging predecessors. lib/env-urls.ts comments rewritten to remove the now-obsolete "inversion" framing.
  • Living docs updated to dev/prod vocabulary (workers/inventory.md, ops/standing-rules.md, architecture/data-architecture.md, architecture/database.md, ops/infrastructure-reference.md). Historical artefacts (time-machine, archive, closed briefs) left untouched per artefact-as-historical-record principle.
  • Migration briefs going forward must enumerate environment scope as part of brief framing. The IMM-01 brief did not, and the resulting implicit-prod-only scope produced the substrate asymmetry this ADR addresses. Future briefs catch this at Gate 1.
  • Substrate-state asymmetry across environments is a Gate 1 finding to surface immediately and resolve before downstream work proceeds — not deferred to a "we'll get to it later" item.
  • Deployed-code asymmetry across environments is a known failure mode (today's discovery): worker rename creates a new worker; route reassignment requires explicit deletion of the previous holder before the route attaches to the new worker. See ROUTE-MIGRATION-REQUIRES-OLD-WORKER-DELETION in standing-rules.md for the discipline this surfaced.
  • ADR-018's per-resource-type rename feasibility analysis is honoured rather than overridden — workers rename freely (new worker created, old retired); D1s rename only via create-migrate-retire and live in their own dedicated brief. The composition is intact.
  • Reversibility: this decision can be reversed by reintroducing three-stage naming, but doing so would require justifying the convention against the operational reality of two-stage workflow. The operational shape is the load-bearing constraint; the naming serves the shape.

Methodology observations. This ADR is the first canonical articulation explicitly born of operational asymmetry surfaced during execution rather than designed-in-advance. The pattern that produced it: Tim's question "what is the cost to actually call dev dev and prod prod" caught Claude drifting toward canonical-work bundling (treating an operational task as architectural smokescreen), and Tim's follow-up "this is foundational" caught Claude over-reverting (abandoning the architectural work entirely). The middle path — foundational work done deliberately, not bureaucratically — is what produced ADR-027. The principle worth pinning forward: when a methodology response feels disproportionate to the operational task, the methodology may be the drift, not the protection. METHODOLOGY-SERVES-FOUNDATION (codified in standing-rules.md) is the canonical articulation.

SUBSTRATE-NAME-FOLLOWS-OPERATIONAL-SHAPE reached promotion threshold during this work — when standard naming conventions don't match operational workflow shape, substrate-name follows operational-shape, not convention. ROUTE-MIGRATION-REQUIRES-OLD-WORKER-DELETION earned four observable applications during the IMM-01 close arc with a refinement (pattern-routes require delete-first; custom-domain routes offer in-place reassignment via deploy prompt).

See ops/standing-rules.md for MANDARIN DATA TAXONOMY; ADR-018 for per-resource-type rename feasibility; ADR-025 for the Peel taxonomy that admits the -staging twin pattern within an environment; client-spine.md dispositional layer for the operator-vocabulary primacy that underpins SUBSTRATE-NAME-FOLLOWS-OPERATIONAL-SHAPE; ENVIRONMENT-NAME-RENAME-01 brief and close report for the mechanical work that landed the operational vocabulary; IMM-01 (IDENTITY-MODEL-MIGRATION-01) for the migration whose dev-side gap surfaced the asymmetry this ADR addresses.


ADR-028 — System observability is a canonical substrate

Status: Accepted Date: 2026-05-28

Context. RTOpacks is not transactional-CRUD software. It is a multi-layer machine: client-tenant activity across customer-facing workers, sitting above a substrate of continuous internal work — ingestions, reconciliations, pre-calculations, matching, inbound webhooks, outbound API calls, scheduled sync runs. Many independent flows run concurrently. In a system of this shape, failures characteristically live in the interactions between components — a sync job starving a lookup, an external-API timeout cascading into a user-facing delay — not in any single component examined alone.

ADR-017 (the in-and-out bus pattern) already named "Observability fails" as a failure mode of scattered flows and committed to instrumentation "applied at the bus level, not per-consumer." ADR-017 established that observability should be uniform. It did not ground where observability data lives, how it is queried, or the commitment that every worker feeds it. This ADR is that grounding — the storage-and-viewing layer for the observability ADR-017 demanded.

The question surfaced concretely: the IMM-01 Phase 3d Gate 1 audit found apps/site writing 8 telemetry/log records directly into rto-ops-db. Inspection showed these were not customer data (the contamination the OPS SURFACE RULE was built against) but the worker logging its own activity — observability data with no home in the MANDARIN taxonomy. The four categories (Pith, Sync-output, Peel, Intake) describe application and reference data. None describes the machine's observability of itself. Per the taxonomy rule (standing-rules.md §201), data fitting none of the four requires either reclassification or a new category with Tim sign-off. System observability is a new category; this ADR names it.

The motivating failure case. A real account (Technology One, related via Kieran) of a system scaled without observability: repeated failures under load, with clients escalating; the problem vanished at ~4am as load wound down and returned as load returned — a load-dependent failure, invisible at rest, emergent only under concurrency. The team knew it was load-related but could not localise it: symptom without instrument. Lacking any record of the real conditions of failure, they could not reproduce it in production and were forced to build a separate environment and synthesise load to chase it — a prolonged, expensive effort. The three requirements this dictates:

  • Instrument before load arrives, not after. Their nightmare was retrofitting gauges onto a system already failing. RTOpacks is pre-launch — it holds the advantage they lacked. The instrument is built before the load, so the first load-dependent problem already carries a trace.
  • Coverage must be uniform across the whole machine. Load problems hide in the gaps between instrumented parts; partial coverage reproduces the blindness on whatever is left dark.
  • The trace must persist and be time-queryable. "It went away at 4am" means the evidence is temporal — the capability to ask "what did the system look like in the 90 seconds before it slowed last Tuesday" requires retained, time-queryable telemetry, not a live-only dashboard that forgets.

This is the concrete form of the standing failure mode RTOpacks designs against — "phones ringing while I'm debugging." Real-time gauges are the defence: find it on the wall, not from a phone call.

Decision. RTOpacks commits to a system-wide observability substrate as a canonical architectural layer. Observability data is two distinct things with two distinct homes:

  • Live telemetry → Cloudflare Analytics Engine. High-volume, fire-and-forget, real-time "what is happening now across the whole machine": page/traffic/lookup/request rates, latency, error rates, external-API timing, threat signals. Written continuously from every worker; queryable live and over retained history; engineered so writes do not backpressure the work that produces them. Analytics Engine is the Cloudflare-native instrument purpose-built for this workload — the platform equivalent of the external telemetry products larger systems adopt.

  • Audit / activity records → D1. Lower-volume, durable, correctness-bearing records of "who did what, when," with latent analytical value (post-hoc pattern analysis — code hotspots, user pathways — is deliberate after-the-fact querying that D1 serves). Distinct question from the live gauges: what happened and what patterns emerge, versus what is happening now.

The two are complementary, not redundant. The machine wants both.

Scope is the whole machine. Customer-facing workers and substrate workers alike (sync, reconcile, internal-api, ingest, customer-facing) instrument into the same layer. Uniform coverage is the anti-blindness requirement — it is structural, not optional per-worker.

Why Analytics Engine over alternatives. A dedicated telemetry D1 (rto-telemetry-db, Peel) would achieve separation from ops-db but reintroduces the scale-degradation the motivating case warns against — D1 is a transactional store, and diagnostic-at-scale queries over millions of rows are exactly where it degrades. Routing telemetry synchronously through internal-api (the literal ADR-017 bus) adds a blocking service-call hop to high-volume fire-and-forget writes; a page-view beacon must never make the page wait on an internal round-trip — right principle, wrong shape for this flow. Analytics Engine honours the bus principle (uniform instrumentation) without the synchronous-hop cost, and fits the workload by design. The cost accepted: a genuinely new infrastructure type (distinct write and query model from D1) and a learning curve; and a viewing layer ("see it on the wall") that is a further build beyond ingest, recorded as forward work.

Consequences.

  • A new MANDARIN category is named: Telemetry (system observability). Live telemetry lands in Analytics Engine (not a D1 — the category's substrate is intentionally outside the D1 taxonomy). Audit/activity records remain D1 (Peel). The taxonomy update is recorded in standing-rules.md in the same commit.
  • Every worker instruments into the observability layer. New worker work instruments from the start; existing workers are brought under the layer as their owners are touched, per ADR-017's migration posture — not a single mass refactor.
  • Telemetry writes are fire-and-forget as a hard constraint: the page renders and the job runs whether or not telemetry lands. A telemetry path that can hang a request has reintroduced the failure it was meant to detect.
  • Uniform pipes, lean gauges: the instrumentation path is built into every worker (uniform coverage, non-negotiable), but the metric set starts small and grows as real questions arise. Coverage is structural; metric depth grows with need. This guards against the opposite of the motivating failure — a telemetry firehose nobody reads.
  • The "see it on the wall" real-time view is a further build (a query/visualiser layer over Analytics Engine), recorded as forward work, not delivered by this ADR.
  • First application (factually reconciled 2026-05-29 — the ADR-028 decisions stand; only the first-application identification was wrong): the apps/site ops-db write retirement was originally framed as the first writer into this ADR's substrate (5 telemetry to Analytics Engine, 1 audit to D1). Substrate audit dissolved that scope — nothing migrated to this ADR's substrate; the retirement landed as two narrower retire briefs (gone-is-gone) and the remaining writes stayed on ops-db. Analytics Engine and rto-audit-db are declared by this ADR (declared-by-decision) but not yet provisioned; the first genuine writer is still pending. Per-write inventory and disposition at docs/docs/ops/audits/IMM-01-Phase-3d-Gate-1-audit.md §6.1; close reports at briefs/closed/WAITLIST-LOOKUP-LOG-RETIRE-01-close.md and briefs/closed/APIREQUESTS-LOG-RETIRE-01-close.md.
  • Reversibility: the audit-vs-telemetry split and the Analytics-Engine choice can be revisited, but the system-wide-uniform-observability commitment is foundational — partial coverage is the failure mode, so reversing toward per-worker discretion reintroduces the blindness this ADR exists to prevent.

Methodology observation. This ADR, like ADR-027, was born of a small operational finding (8 stray writes) that surfaced a foundational gap. The discipline that produced it: the writes were not waved through as a rule-violation to remediate, nor inflated into an unscoped observability project — the prior question ("is this the contamination the rule targets, or an unnamed category?") was asked first, and the answer reshaped the work from "redirect 8 writes" to "name the observability substrate." Small findings surfacing big architecture is the working pattern; the guard is keeping the execute scope small (apps/site's 8 writes) while letting the decision be foundational (the ADR).

See ops/standing-rules.md for the MANDARIN DATA TAXONOMY (Telemetry category added by this ADR); ADR-017 for the bus pattern this ADR grounds; ADR-018 for cartography (the observability layer is a candidate for visual mapping); OBSERVABILITY-SUBSTRATE-DECISION-01 for the full decision analysis and the motivating case; the standing failure mode "phones ringing while I'm debugging" in project orientation.


ADR-029 — The audit substrate is a uniform activity stream

Status: Accepted Date: 2026-05-28

Context. ADR-028 named the audit/activity half of the observability substrate as D1 (Peel) and committed to whole-machine coverage: every worker that performs a durable, correctness-bearing action ("who did what, when") writes an audit record. (Factually reconciled 2026-05-29 — only the first-writer identification was wrong: this ADR was drafted assuming apps/site's mode_switch_log would be the first writer migrating into a new rto-audit-db. Substrate audit dissolved that scope; mode_switch_log is not migrating, and the first genuine writer is still pending. The table-shape decision below stands unchanged.)

Provisioning that database forces a decision that cannot be deferred: what shape do audit records take across the whole machine? Two shapes are possible, and the first table created sets the precedent by example whether or not the precedent is chosen deliberately:

  • Shape A — one table per event kind. mode_switch_log, sync_run_log, setting_change_log, etc. Each action gets a bespoke table with columns fitted to that action. Per-table tidiness; an ever-growing table list; the cross-machine question ("what did the whole system do in this time window") requires stitching many tables together.
  • Shape B — one uniform activity table. A single activity table; every worker writes a row tagged with its source. The cross-machine question is one query filtered by time; event-specifics live in a flexible field rather than bespoke columns.

The decision governs every future audit writer, not just the first — so per CANONICAL-DECISION DISCIPLINE it is an ADR, settled before the execute brief builds against it, not an implicit consequence of the first table's schema.

The deciding argument. The audit substrate exists to answer the cross-machine question — this is ADR-028's entire motivating case (the load-dependent failure localised only by seeing across the machine at a moment in time: "what did the system look like in the 90 seconds before it slowed last Tuesday"). Shape B answers that in one query. Shape A makes it an assembly job — and the assembly cost comes due at the worst possible moment, when the system is failing under load and reconnaissance is most urgent and slack is least available. A design that taxes you hardest exactly when you need it most is the wrong design. Shape B also is the "illuminated fence down the middle of the mandarin" applied to observability: one place to look, filter by source — already knowing which paddock the haystack is in. Shape A is a dozen paddocks.

The cost of Shape B, stated honestly: event-specific fields (e.g. which mode a mode-switch targeted) live inside a flexible detail field rather than as typed columns, so deep analysis of a single event-type is clumsier than under Shape A's bespoke columns. This trade is correct for an audit log, whose constant query is the cross-cutting who/what/when/where (all typed columns under Shape B) and whose rare query is the single-event-type deep-dive. The common query stays trivial; the rare query gets slightly harder. For the high-volume metrics side this trade would weigh differently — but metrics are Analytics Engine's job per ADR-028, and Analytics Engine is purpose-built for deep aggregation. Each half of the observability substrate gets the tool that fits it.

Decision. The audit substrate is a single uniform activity stream: one activity table in rto-audit-db, into which every audit writer across the machine inserts a source-tagged row. Audit records are not partitioned into per-event-kind tables.

The canonical activity table shape:

Column Type Role
id INTEGER PK (autoincrement) row identity
timestamp TEXT (ISO 8601) / INTEGER (epoch) when — the primary sort/filter axis
source TEXT which worker/surface emitted the record (e.g. apps/site)
client_id TEXT, nullable which tenant the action belongs to; NULL for operator/system actions
actor TEXT who performed the action — canonical usr_… UUID (ADR-024) for users, system:<worker-name> for system actors
action TEXT what happened (e.g. mode_switch)
detail TEXT (JSON) event-specific payload; the flexible field for everything not promoted to a column

The columns before detail are the constant-filter dimensions — directly queryable, strongly typed. detail is the single flexible field for event-specifics; querying inside it is the accepted clumsy path, reserved for the rare single-event deep-dive.

client_id is nullable by design and maps to the access-control model (ADR-020): operator-tier actions (T3, NULL client_id) and client-scoped actions (T4/T4A, populated client_id) both fit the same table, and the audit log can answer both "everything for client X" and "everything operators did" without reaching into detail. The nullability encodes the operator-vs-client distinction natively.

actor format is pinned to the canonical identity model, not left free-form. For user actors it holds the canonical user identity (usr_… UUID per ADR-024); for system/worker actors it holds system:<worker-name> (e.g. system:sync, system:reconcile). This is not premature convention-setting under lean-start: it adopts an existing canonical format (ADR-024) rather than inventing one, so the audit log does not silently drift from the identity model the rest of the substrate already commits to. Free-form actor would invite the first writer to coin a format the second writer copies — the same first-instance-sets-the-precedent trap this ADR addresses for table shape, applied to column contents. The operator-vs-client distinction is carried by client_id (NULL for operator/system), not by actor format — the two columns do not overlap in what they encode: actor is who, client_id is whose tenant.

Consequences.

  • rto-audit-db will be provisioned with a single activity table at this shape (and rto-audit-db-dev per env parity, ADR-027) when its first genuine audit writer lands. (Factually reconciled 2026-05-29 — only the first-writer identification was wrong: the original draft identified apps/site's mode-switch as the first writer, but substrate audit dissolved that assumption and nothing migrated to this substrate. rto-audit-db remains declared-by-decision but not yet provisioned; the first genuine writer is still pending.) When the first writer lands, it inserts as source = '<worker>', action = '<event>', with event-specifics in detail, per the shape above.
  • Every future audit writer (whole-machine rollout per ADR-028) inserts into the same activity table with its own source. No new audit tables are created per worker. A worker that needs a constant-filter dimension not covered by the existing columns proposes promoting that dimension to a real column via a follow-on ADR — the column set can grow deliberately; it does not fragment into tables.
  • The cross-machine reconnaissance query is one SELECT ... WHERE timestamp BETWEEN ... [AND source = ... | AND client_id = ...]. This is the capability ADR-028 exists to provide.
  • The detail JSON is the soft spot by design. Anything discovered to be a constant-filter need is promoted to a typed column by ADR, not left in detail indefinitely. Lean-start applies: the column set begins minimal and grows with demonstrated need, mirroring ADR-028's "uniform pipes, lean gauges."
  • This is the audit (D1) half only. Live telemetry remains Analytics Engine per ADR-028 and is unaffected by this decision.
  • Reversibility: the activity-stream commitment is foundational for the same reason ADR-028's uniform-coverage commitment is — fragmenting back into per-event tables reintroduces the stitching cost the decision exists to avoid. The column set is extensible; the one-uniform-table shape is the load-bearing commitment.

Methodology observation. This ADR was surfaced by a brief-review question — "does the first table in a whole-machine DB set an accidental precedent?" — and the honest answer was that creating the database is the convention decision; deferring it to "the second writer" was a dodge, because the first table's shape is the precedent. Naming the decision explicitly and settling it before provisioning is the correction. Small provisioning act, foundational decision underneath it — the same pattern as ADR-028 (small finding, foundational gap), caught one layer earlier.

See ADR-028 for the observability substrate this grounds (audit/D1 half); ADR-020 for the access-control tier model client_id nullability maps to; ADR-027 for the env-parity commitment (-dev twin).


End of architecture-decisions log. New decisions appended below this line.


ADR-030 — Internal-api trust is closed by topological isolation, not a credential

Status: Accepted Date: 2026-05-30 Supersedes: the per-source-secret approach in INTERNAL-API-SOURCE-HEADER-AUTH-01 (a brief, now retired — not executed).

Context. workers/internal-api/src/index.ts mints a synthetic caller identity from the bare string value of the X-RTP-Internal-Source header (admin-worker → is_super:true, plus site-worker, workspace-worker), with no token, secret, or signature validated. The header is trusted purely because of where the request can arrive from — a privileged identity asserted by a forgeable credential. Pre-revenue, no customer data is at risk, but it is a real super-operator trust flaw.

Interim hardening is already closed and verified: workers.dev/preview URLs disabled on internal-api; the eight public-edge HTTP fallbacks in admin and site neutered (fail-closed, binding-first retained); and a WAF presence-block live on both zones (has_key(http.request.headers, "x-rtp-internal-source") on the internal-api hostnames → Block), forging the header now returns 403 at the edge before the worker runs. The external forgery door is shut; the internal trust model is not. internal-api still trusts the header from callers arriving over its service binding.

Two terminal fixes were considered:

  • A credential — validate the header as a per-source shared secret (INTERNAL-API-SOURCE-HEADER-AUTH-01). Locks the door.
  • Topological isolation — make internal-api binding-only, with no public route at all, so the header is trustworthy by construction. Removes the door.

internal-api has a public route for one reason: webhooks and callbacks land on it (Stripe at /billing/webhook; QuickBooks OAuth at /billing/qb-callback). Removing the public route requires those to land somewhere else first.

The deciding argument. Removing the door beats locking it. A secret is a standing credential: it must be stored, rotated, and leak-guarded, and it adds an auth branch in the worker that can be subtly wrong. Isolation is subtraction — once internal-api has no public route, there is nothing to forge from outside and no credential to maintain. It is also Cloudflare's documented pattern (a Worker not reachable via the public Internet, only via service binding), and it composes with the binding-as-trusted-channel principle: a service binding dispatches inside the Cloudflare runtime and cannot be forged or intercepted from outside the account, so identity carried over it does not need a secret to be trustworthy. A secret built now would be expensive throwaway work, because the terminal answer (isolation) is already known — "why touch it twice."

Honest calibration. The external door is already shut. The only residual during the isolation runway is the inside-job vector — a compromised in-account worker forging the header over a binding. For a solo operator running only his own code, pre-revenue, that risk is remote today. A docs-verified fact bounds what isolation alone achieves: a service-binding call does not give the callee a verified identity of the calling worker — internal-api knows only that a request arrived over a binding, not which worker sent it. So isolation removes the external attack surface but does not, by itself, bind privilege against an inside-account actor. The condition that escalates the inside-job risk is other people's hands or untrusted third-party code running inside these workers — the trigger is people and dependencies, not elapsed time. (This is precisely why the fork below is resolved toward closing the inside-job door now, while it is cheapest: the cost only rises once customers, integrations, and a larger fleet arrive.)

Decision. internal-api's trust flaw is closed by topological isolation: internal-api becomes binding-only with no public route. The per-source-secret approach is rejected as the terminal fix and its brief retired. The work is sequenced as a three-part programme (INTERNAL-API-ISOLATION-PROGRAMME-01):

  1. Migrate the public-hostname (Pattern-B) callers onto the service binding.
  2. Extract the public receiving surface (/billing/*) into a dedicated public worker that performs cross-boundary authentication — HMAC signature verification for the Stripe webhook, OAuth state/code validation for the QuickBooks callback — then calls internal-api over the binding.
  3. Remove internal-api's public route → binding-only.

Resolved sub-decision (2026-05-30, by ADR-031). Whether, at route removal, the X-RTP-Internal-Source header is retired in favour of non-forgeable per-caller identity (e.g. distinct WorkerEntrypoints per caller class — closing the inside-job vector), or kept as an internal trust signal (simpler — inside-job vector stays open), was originally left open in this ADR. ADR-031 and the CHANNEL SEPARATION RULE now resolve it: non-forgeable per-caller internal identity is required, and the bare-header trust model is retired. Pure isolation that keeps trusting the header internally ("pure-B") is no longer an option. What remains for the route-removal sub-brief's Gate 1 is validating the mechanism against internal-api's actual structure — single fetch handler with internal routing, or already modular — and recording it in a follow-on implementation ADR. The decision (close the inside-job door) is settled; only the mechanism is open. This ADR commits to isolation; ADR-031 commits to the non-forgeable internal identity.

Consequences.

  • internal-api loses its public route on programme completion. A new public webhook-receiver worker is provisioned (NAMING-PAUSE RULE applies to its hostname/name; Stripe and QuickBooks dashboards repointed to it — Tim's hands).
  • The WAF edge-block and neutered fallbacks remain the primary control, unweakened, until the public route is gone. After that the WAF rule is retained as inert defence-in-depth (it matches nothing once there is no public route). Any relaxation of a security control is a separate, explicit Tim decision.
  • Stripe and QuickBooks each require a current EXT-API reference doc in docs/ops/ before the receiver ships (EXT-API RULE).
  • Reversibility: re-adding a public route would reintroduce the exact exposure this ADR exists to remove — so the binding-only end state is meant to be sticky, not casually reversible. That stickiness is the point.

See ADR-020 (access-control tier model the synthetic identity maps to); ADR-027 (environment parity — the binding is wired in dev too, so dev and prod use the same channel and no public fallback is ever needed); ADR-031 (the channel-separated architecture commitment that resolves the inside-job fork); and the CHANNEL SEPARATION RULE (the durable channel/credential principle this programme is the first build of).


ADR-031 — Channel-separated service architecture is a canonical commitment

Status: Accepted Date: 2026-05-30 Depends on: ADR-030 landing first (same arc); ADR-030 is the first application of this commitment.

Context. The forgeable X-RTP-Internal-Source exposure (May 2026) was not an isolated bug — it was an instance of a class of flaw: privileged or internal identity asserted by a forgeable credential reachable over a public surface. ADR-030 closes that instance for internal-api by topological isolation. But the Gate-1 audit of even that single fix found the public-surface dependency was broader than known — server-side callers, browser callers fetching an internal hostname directly, and external webhooks — which is evidence the pattern is pervasive across the fleet, not confined to one worker. Left to per-incident fixes, the same class recurs wherever a new service-to-service path or a new browser→service path is added without the discipline. Each recurrence is found late (at worst, by an attacker) and patched expensively.

The decision is to stop treating this as a sequence of incidents and make the mitigation architectural and forward-binding: channel separation is the default design for all service interaction, so the flaw class is mitigated by construction on every new path, not discovered and patched after the fact. This governs implementation across every module and substrate, so per CANONICAL-DECISION DISCIPLINE it is an ADR, not a brief-scoped choice.

Decision. Channel-separated service architecture is the canonical default for all RTOpacks development going forward. The trust boundary is RTOpacks' own Cloudflare account.

  • Within the boundary (service-to-service, same account): use the service binding (env.X.fetch). The binding is the trusted channel — it dispatches inside the Cloudflare runtime and cannot be forged or intercepted from outside.
  • Across the boundary (separate account — e.g. the UCCA Inc engine once it splits out; external vendors, customer LMSs, third-party and government APIs): authenticate cryptographically (signed token or mutual auth, verified server-side). Never a forgeable claim on a public surface. The boundary triggers the rule in both directions — outbound and inbound.
  • Browser → internal services: the browser calls a same-origin route which server-side uses the binding. Browsers never call internal hostnames directly. (A browser cannot use a service binding; a direct browser→internal-hostname call is a public-surface dependency by definition.)
  • Fail closed: an internal call whose binding is unavailable errors loudly; it never falls back to the public edge. The "binding-first with public-HTTP fallback" pattern is forbidden.
  • The account boundary is the outer trust unit; within it, internal caller identity must be non-forgeable. A binding establishes that a request arrived over a trusted channel, but not which in-account worker sent it — so a bare header any in-account worker can set is not sufficient for privileged internal identity. Eliminating the in-account inside-job vector — binding identity to the private entrypoint a caller arrives through (distinct WorkerEntrypoints per caller class), or another non-forgeable per-caller mechanism — is required, not optional. A forgeable internal claim (the bare X-RTP-Internal-Source header trusted on its face) is non-compliant. (This resolves the internal-api pure-B vs bind-by-entrypoint fork — opened in ADR-030 — toward non-forgeable identity; the mechanism is validated against internal-api's actual structure at the route-removal Gate 1, but pure-B is rejected. The invariant is mandated now; only the mechanism is chosen later.)

What's not in scope (unchanged from the companion rule): end-user-facing product surfaces, which are public by design — this governs service-to-service calls, not user endpoints; and the rare, documented case of deliberately routing internal traffic over the edge for an edge-only feature.

Consequences.

  • New service paths are designed channel-separated from the start — the ground-up posture. New external integrations carry cryptographic auth and an EXT-API reference doc (EXT-API RULE) as a precondition of shipping.
  • Existing violations are dispositioned by a fleet channel-sweep (the companion rule's audit lens). The internal-api isolation programme is the first sweep; it has already surfaced the browser-direct-to-internal pattern as a general shape (workspace admin pages), which the sweep now checks for elsewhere.
  • Cost, stated honestly: every browser→internal path gains a same-origin route layer; every internal call gives up the public-HTTP fallback; and — under the non-forgeable-internal-identity requirement above — every existing internal caller migrates from the bare header to the per-caller mechanism. That includes the Pattern-B callers already moved onto the binding (INTERNAL-API-PATTERN-B-TO-BINDING-01), which currently still carry the header and are therefore A-pending. The size of that migration is established by audit against internal-api's actual structure before it is committed — it is not assumed small. This cost is paid at design time, deliberately, instead of as incident response later. That trade is the entire point — the alternative is paying it under exploitation, with customer data at risk.
  • A security dividend falls out of the browser pattern: once browser callers go via same-origin routes, the session cookie can become httpOnly (no longer read by client JS), closing the JS-readable-credential gap.
  • This is a default, not a cage — the not-in-scope carve-outs remain, and deviations are allowed when documented with a reason. The non-forgeable-internal-identity requirement, however, is a mandate within the boundary, not a default to be deviated from casually.
  • Reversibility: as a forward-binding default it is meant to be sticky; abandoning it would reintroduce the recurring flaw class it exists to retire.

Companion discipline. The CHANNEL SEPARATION RULE operationalises this commitment: every brief-draft and every audit checks channel compliance, and the fleet channel-sweep enumerates and dispositions existing violations. The rule is how this ADR is enforced on every piece of work.

See ADR-030 (first application — internal-api isolation; this ADR resolves the inside-job fork ADR-030 opened); the CHANNEL SEPARATION RULE (companion enforcement discipline); the HARD SEPARATION RULE (sibling — separates data domains, where this separates communication channels); and ADR-027 (environment parity / illuminated-fence — bindings are wired in dev too, so dev and prod use the same channel and no public fallback is ever needed).