Source of Truth Connectivity¶

Terse reference for every external endpoint we depend on. Read this before debugging any "the API isn't responding" problem — we keep rediscovering the same gotchas and they cost hours each time.

Format: one row per endpoint, symptom → fix, no narration.

Rule 0: these endpoints are the source of truth¶

We cannot regenerate TGA, TEQSA, CRICOS, VOCSTATS, YourCareer data. How we connect to them is P1 infrastructure. Every gotcha below cost us real hours to rediscover — update this doc the moment you learn something new.

Endpoint gotcha matrix¶

Endpoint	Symptom	Cause	Fix
`training.gov.au/api/*`	`curl` hangs indefinitely at HTTP layer on macOS. No response, no timeout.	Unknown (TLS handshake via `openssl s_client` succeeds; browser and Node `fetch` work; only `curl`'s HTTP layer hangs).	Never debug TGA with curl. Use `node -e 'await fetch(...)'` or the browser. Confirmed broken in sandbox + host macOS.
`training.gov.au/api/training/{code}/releases/{n}/unitgrid`	You don't know it exists.	The TGA swagger spec at `/swagger/` lists 8 separate Level-1 JSON files — the `Training - v1/swagger.json` spec includes `GetUnitGrid` returning `UnitGridUsage[]` with `isEssential: true\\|false` per unit. This is the authoritative core/elective split — zero parsing needed. Only discovered in the STUDIO-PACKAGING-PARSE-01 swagger audit.	*Before writing any LLM parser against TGA content, check `/swagger//swagger.json` for a structured endpoint first.** See `tga-unitgrid-endpoint.md` for the canonical pattern.
`training.gov.au/api/*` (worker context)	Silent empty results in workspace routes that look like "no matches found".	Global `fetch('https://internal-api.rtopacks.com.au/...')` gets 302'd by CF Access and returns empty.	Use service bindings: `env.INTERNAL_API.fetch(...)`. Never use global fetch for worker-to-worker calls inside the org.
`data.gov.au` (TEQSA, CRICOS)	Worker hangs then gets killed by CF CPU limit mid-request.	data.gov.au responses routinely take 10–25 seconds; default `fetch()` has no timeout, so the worker blocks.	`fetch(url, { signal: AbortSignal.timeout(25000) })` — 25s minimum, else Worker dies. See `ops/teqsa-api-reference.md:52`.
`data.gov.au` (any /run handler)	Diagnostic rows stuck in pending state.	`ctx.waitUntil(fetch(...))` lets the response return before fetch resolves, so the handler's own write never lands.	Never `ctx.waitUntil(fetch(...))` in /run handlers. Await the fetch before returning. Standing rule OBS-INTEGRITY-02.
YourCareer (`yourcareer.gov.au`)	no known gotchas yet — add here when discovered
VOCSTATS	no known connectivity gotchas — schema details in `data/vocstats-architecture.md`
Anthropic Haiku API	429 rate-limit errors at 3x+ parallel.	Token-per-minute limit, not request-per-minute. Large inputs (100KB packaging_rules HTML) burn through the budget fast even at low concurrency.	2x parallel with 2s stagger + multi-pass convergence. Each pass commits individually; failures auto-retry on next run. See `packaging-pipeline.md` "Stage 3 rate limit lessons" for the full matrix.
Stripe API	API surface actively refactoring quarter-to-quarter; code written against old shapes silently drifts.	Stripe deprecates without breaking.	Pin API version in client config; re-verify every session where billing code is touched.
QuickBooks API	Same as Stripe — vendor API instability.	QB refactoring too.	Verify shape on every visit; don't trust stale memory.

Cloudflare D1 write-layer gotchas¶

Separate class of problem: once you have the data from an external API, writing it back to D1 has its own set of traps.

Symptom	Cause	Fix
`wrangler d1 execute --remote --file bulk.sql` fails with `Network connection lost` every time.	Wrangler's `--file` path uses D1's async polling API, broken in 4.78.0 and 4.83.0 for any non-trivial payload. Even a 1-line SELECT fails in some cases.	Never use `--file` for bulk writes. Hit the CF D1 HTTP API directly: `POST /accounts/{id}/d1/database/{db}/query`. Synchronous, no polling, no wrangler overhead.
Multi-statement payloads (`UPDATE...; UPDATE...; UPDATE...;`) fail with `SQL code did not contain a statement` or silent `Network connection lost`.	Double trailing `;;` from `.join(';')` on already-terminated statements; also, D1 multi-statement support is unreliable at size.	Strip trailing semicolons with `.replace(/;+\s$/, "")`. Send one statement per request* with `Promise.allSettled` concurrency ~10.
`UPDATE qualifications SET packaging_rules = '...' WHERE ...` fails with `SQLITE_TOOBIG` once the inline literal exceeds ~100KB.	SQLite's max statement length. The escaped string + SQL overhead blows past the limit.	Use bound parameters: `sql: "UPDATE ... SET packaging_rules = ?1 WHERE qual_code = ?2", params: [content, code]`. Confirmed working at 187KB.
Even with bound params, combining every column into one UPDATE still hits SQLITE_TOOBIG when multiple large text fields collide.	Parameter values are counted against the statement size too.	Split logically: one UPDATE for small scalar fields, a second UPDATE for each oversized text field individually.
Can't tell which rows were actually updated after a partial rerun.	Content-shape heuristics (looking for `<p>` tags, etc.) produce thousands of false positives — real content has HTML.	Use `substr(enriched_at,1,10)` vs today's date. The rows whose `enriched_at` is stale are the stragglers — exact, not heuristic.

Sandbox vs host vs browser¶

Different environments hit the same endpoints differently. When something hangs, check whether it hangs in all three before guessing.

Environment	TGA API	data.gov.au	Notes
Claude Code sandbox (Bash tool)	`curl` hangs; Node `fetch` works	OK with timeout	Node is the reliable path
Host macOS terminal	`curl` hangs; Node `fetch` works; `openssl s_client` handshakes	OK with timeout	Same as sandbox — it's curl, not the sandbox
Browser	Works	Works	Use as ground-truth fallback for "is the endpoint even up?"
Cloudflare Worker (`fetch` global)	Works	Works with 25s+ timeout	The only environment that matters for prod, but hardest to debug interactively

Script paths after monorepo moves¶

Not connectivity per se, but in the same class of "script that used to work now doesn't": every tool script in tools/ has a RTOPACKS_SITE / WRANGLER_CWD / similar cwd constant. These break silently whenever we reorg the monorepo. When a legacy tool script fails on its first wrangler call, check the cwd path resolves before anything else.

Known stale-after-reorg paths (fixed): tools/tga-enrich-quals.mjs was pointing at repo/site which hasn't existed since session 58; repointed to workers/internal-api.

When to update this doc¶

You hit a new connectivity failure and burn more than 15 minutes diagnosing it.
You discover an environment quirk (new sandbox behavior, new vendor deprecation).
You confirm an existing row in the matrix is no longer true (cross it out, don't delete — the history matters).

The whole point of this file is that it's the first thing to read when debugging "why isn't this endpoint responding" — if you had to reason it out from scratch, this doc has failed and you should fix it.