Skip to content

Source of Truth Connectivity

Terse reference for every external endpoint we depend on. Read this before debugging any "the API isn't responding" problem — we keep rediscovering the same gotchas and they cost hours each time.

Format: one row per endpoint, symptom → fix, no narration.

Rule 0: these endpoints are the source of truth

We cannot regenerate TGA, TEQSA, CRICOS, VOCSTATS, YourCareer data. How we connect to them is P1 infrastructure. Every gotcha below cost us real hours to rediscover — update this doc the moment you learn something new.


Endpoint gotcha matrix

Endpoint Symptom Cause Fix
training.gov.au/api/* curl hangs indefinitely at HTTP layer on macOS. No response, no timeout. Unknown (TLS handshake via openssl s_client succeeds; browser and Node fetch work; only curl's HTTP layer hangs). Never debug TGA with curl. Use node -e 'await fetch(...)' or the browser. Confirmed broken in sandbox + host macOS.
training.gov.au/api/training/{code}/releases/{n}/unitgrid You don't know it exists. The TGA swagger spec at /swagger/ lists 8 separate Level-1 JSON files — the Training - v1/swagger.json spec includes GetUnitGrid returning UnitGridUsage[] with isEssential: true\|false per unit. This is the authoritative core/elective split — zero parsing needed. Only discovered in the STUDIO-PACKAGING-PARSE-01 swagger audit. Before writing any LLM parser against TGA content, check /swagger/*/swagger.json for a structured endpoint first. See tga-unitgrid-endpoint.md for the canonical pattern.
training.gov.au/api/* (worker context) Silent empty results in workspace routes that look like "no matches found". Global fetch('https://internal-api.rtopacks.com.au/...') gets 302'd by CF Access and returns empty. Use service bindings: env.INTERNAL_API.fetch(...). Never use global fetch for worker-to-worker calls inside the org.
data.gov.au (TEQSA, CRICOS) Worker hangs then gets killed by CF CPU limit mid-request. data.gov.au responses routinely take 10–25 seconds; default fetch() has no timeout, so the worker blocks. fetch(url, { signal: AbortSignal.timeout(25000) }) — 25s minimum, else Worker dies. See ops/teqsa-api-reference.md:52.
data.gov.au (any /run handler) Diagnostic rows stuck in pending state. ctx.waitUntil(fetch(...)) lets the response return before fetch resolves, so the handler's own write never lands. Never ctx.waitUntil(fetch(...)) in /run handlers. Await the fetch before returning. Standing rule OBS-INTEGRITY-02.
YourCareer (yourcareer.gov.au) no known gotchas yet — add here when discovered
VOCSTATS no known connectivity gotchas — schema details in data/vocstats-architecture.md
Anthropic Haiku API 429 rate-limit errors at 3x+ parallel. Token-per-minute limit, not request-per-minute. Large inputs (100KB packaging_rules HTML) burn through the budget fast even at low concurrency. 2x parallel with 2s stagger + multi-pass convergence. Each pass commits individually; failures auto-retry on next run. See packaging-pipeline.md "Stage 3 rate limit lessons" for the full matrix.
Stripe API API surface actively refactoring quarter-to-quarter; code written against old shapes silently drifts. Stripe deprecates without breaking. Pin API version in client config; re-verify every session where billing code is touched.
QuickBooks API Same as Stripe — vendor API instability. QB refactoring too. Verify shape on every visit; don't trust stale memory.

Cloudflare D1 write-layer gotchas

Separate class of problem: once you have the data from an external API, writing it back to D1 has its own set of traps.

Symptom Cause Fix
wrangler d1 execute --remote --file bulk.sql fails with Network connection lost every time. Wrangler's --file path uses D1's async polling API, broken in 4.78.0 and 4.83.0 for any non-trivial payload. Even a 1-line SELECT fails in some cases. Never use --file for bulk writes. Hit the CF D1 HTTP API directly: POST /accounts/{id}/d1/database/{db}/query. Synchronous, no polling, no wrangler overhead.
Multi-statement payloads (UPDATE...; UPDATE...; UPDATE...;) fail with SQL code did not contain a statement or silent Network connection lost. Double trailing ;; from .join(';') on already-terminated statements; also, D1 multi-statement support is unreliable at size. Strip trailing semicolons with .replace(/;+\s*$/, ""). Send one statement per request with Promise.allSettled concurrency ~10.
UPDATE qualifications SET packaging_rules = '...' WHERE ... fails with SQLITE_TOOBIG once the inline literal exceeds ~100KB. SQLite's max statement length. The escaped string + SQL overhead blows past the limit. Use bound parameters: sql: "UPDATE ... SET packaging_rules = ?1 WHERE qual_code = ?2", params: [content, code]. Confirmed working at 187KB.
Even with bound params, combining every column into one UPDATE still hits SQLITE_TOOBIG when multiple large text fields collide. Parameter values are counted against the statement size too. Split logically: one UPDATE for small scalar fields, a second UPDATE for each oversized text field individually.
Can't tell which rows were actually updated after a partial rerun. Content-shape heuristics (looking for <p> tags, etc.) produce thousands of false positives — real content has HTML. Use substr(enriched_at,1,10) vs today's date. The rows whose enriched_at is stale are the stragglers — exact, not heuristic.

Sandbox vs host vs browser

Different environments hit the same endpoints differently. When something hangs, check whether it hangs in all three before guessing.

Environment TGA API data.gov.au Notes
Claude Code sandbox (Bash tool) curl hangs; Node fetch works OK with timeout Node is the reliable path
Host macOS terminal curl hangs; Node fetch works; openssl s_client handshakes OK with timeout Same as sandbox — it's curl, not the sandbox
Browser Works Works Use as ground-truth fallback for "is the endpoint even up?"
Cloudflare Worker (fetch global) Works Works with 25s+ timeout The only environment that matters for prod, but hardest to debug interactively

Script paths after monorepo moves

Not connectivity per se, but in the same class of "script that used to work now doesn't": every tool script in tools/ has a RTOPACKS_SITE / WRANGLER_CWD / similar cwd constant. These break silently whenever we reorg the monorepo. When a legacy tool script fails on its first wrangler call, check the cwd path resolves before anything else.

Known stale-after-reorg paths (fixed): tools/tga-enrich-quals.mjs was pointing at repo/site which hasn't existed since session 58; repointed to workers/internal-api.


When to update this doc

  • You hit a new connectivity failure and burn more than 15 minutes diagnosing it.
  • You discover an environment quirk (new sandbox behavior, new vendor deprecation).
  • You confirm an existing row in the matrix is no longer true (cross it out, don't delete — the history matters).

The whole point of this file is that it's the first thing to read when debugging "why isn't this endpoint responding" — if you had to reason it out from scratch, this doc has failed and you should fix it.

See also:

  • ops/tga-api-field-inventory.md — TGA API field-level inventory + full TGA-SYNC-FIX-01 lessons (superset of the TGA row above)
  • ops/teqsa-api-reference.md — TEQSA endpoint schema and fetch timeout rationale
  • infrastructure/tga-ingest.md — TGA ingest worker architecture (on-demand queue consumer)
  • data-sources.md — authoritative dataset registry