TGA API Field Inventory¶
Audit ID: TGA-AUDIT-01 Audit date: 2026-04-15 Last updated: 2026-04-16 (post TGA-SYNC-FIX-01 + TGA-VERSION-SCHEMA-01) Audited qualification: BSB40920 — Certificate IV in Project Management Practice Auditor: Claude (Studio session 60) Status: Reference document. Update this file whenever the TGA API shape, the rtopacks-db schema, or the ingest/enrich code paths change.
2026-04-16 update — what shipped + one retraction¶
Read this before trusting anything below. The original TGA-AUDIT-01 recommendations (items 1–15 in the "Required changes" section) have been partially executed, and one of them was wrong.
CRITICAL ENDPOINT DISCOVERY (2026-04-16 afternoon): TGA publishes the authoritative core/elective split as a structured JSON endpoint we had not been using:
Returns an array of UnitGridUsage objects, one per unit in the qualification's grid, with isEssential: true (core) or isEssential: false (elective). No LLM parsing of packaging_rules prose is needed for core/elective classification. See tga-unitgrid-endpoint.md for the full contract, field mapping, and ingestion pattern.
The endpoint was found by pulling https://training.gov.au/swagger/Training - v1/swagger.json during STUDIO-PACKAGING-PARSE-01 execution. TGA's swagger UI at /swagger/ points at 8 separate spec files (Content - v1, Export - v1, Feedback - v1, Metadata - v1, Organisation - v1, Report - v1, Search - v1, Training - v1) — worth grepping them before writing any TGA integration. What the unitgrid endpoint does NOT give you: Group A/B/C sub-structure, selection rules, open corpus fallback. Those still require the LLM parser (proven and ready in tools/parse-packaging-rules.mjs, canary passed).
Shipped:
- TGA-SYNC-FIX-01 (2026-04-15) — items 1, 2, 3. Fixed the two
taxonomytypos inscripts/workers/tga-sync/src/index.ts(taxonomy.industry→industrySectors,taxonomy.occupation→occupations). Re-rantools/tga-enrich-quals.mjs --forceacross the full 8007-qual corpus to overwrite wrongpackaging_rules(was release-notes HTML, now actual packaging rules from content bundle0116) and backfillfield_of_education+anzsco_codes. 5202 ok, 2805 partial (superseded/deleted with incomplete TGA data), 0 failed. - TGA-VERSION-SCHEMA-01 (2026-04-15/16) — version pinning columns landed. Actual scope (see retraction below):
qualifications.release_id TEXT— release UUID from Level 1releases[current].idqualifications.content_bundle_id TEXT— content bundle UUID from Level 2contentBundles[0].id(current release only)qualifications.teach_out_date TEXT— reserved, null for now (no TGA source field found)qualifications.release_number+tga_latest_releaserefreshed (both pre-existed;tga_latest_releaseis the canonical release date under a legacy column name — not renamed to avoid D1 table recreation)studio_sessions.qualification_release TEXT+qualification_release_date TEXT— pinned on session creation, immutable- Populated via new
tools/tga-enrich-quals.mjs --version-fieldsflag (Level 1 + Level 2 only, no bundle fetch). Verified end-to-end:BSB40920 → release 1, 2020-10-19via browser session creation, pinning lands onstudio_sessionsrow.
Retraction — releases[].currency does NOT mean qual-level status.
TL;DR rows 5–6 and Required-changes items 4–5 below proposed capturing releases[].currency and releases[].currencyChangeDate as qual-level status / teach-out fields. That was wrong. Empirically verified against BSB40515 (a known Superseded qual):
usageRecommendationLabel: "Superseded" ← qual-level status
releases: [
{num:"3", currency:"current", date:"2018-09-27"},
{num:"2", currency:"replaced", date:"2016-01-14"},
{num:"1", currency:"replaced", date:"2015-03-25"},
]
release.currency only distinguishes the latest release in this qual's history ("current") from earlier releases of the same qual ("replaced"). It's always "current" for the release we pick (by construction — we use .find(r => r.currency === "current")). At the qual-row level it's meaningless, and currencyChangeDate for the current release equals releaseDate — duplicate.
Qual-level status lives in usageRecommendationLabel and was already being written to the pre-existing qualifications.status column (Current / Superseded / Deleted). The intent was already satisfied by an existing column.
Compensating action: currency and currency_change_date columns were added in 2026-04-15-tga-version-schema-01-qualifications.sql and then dropped the same day in 2026-04-16-tga-version-schema-01a-drop-currency.sql. Do not re-add them based on the TL;DR table or items 4–5 below without first re-reading this section.
Do not trust:
- TL;DR row 5 ("Status in three places") — yes, three places, but only
usageRecommendationLabelis qual-level. - TL;DR row 6 ("Teach-out date") —
currencyChangeDateon the current release equalsreleaseDate; it's not a teach-out signal. May carry teach-out semantics on non-current releases, unconfirmed. Irrelevant for qual-row pinning either way. - Required-changes items 4, 5 — the columns they describe are useless.
- Required-changes item 6 — we went a different way (new columns on
qualifications, lefttga_training_releasesalone).
Update (same day, a few hours later) — TGA-MAPPING-DIRECTION-AUDIT also resolved: code is the successor, mapsToCode is the predecessor. Audit caught two additional bugs in tools/tga-enrich-quals.mjs: (1) qualifications.supersedes had been storing m.code (often the qual itself) instead of applying the directional filter; (2) qualifications.superseded_by was NULL for every row in the corpus because buildSupersededByMap() walked a field that doesn't exist on search API responses. Both fixed via deriveSupersession() + full-corpus --supersession-fields rerun. See the "Direction of mappingInformation — RESOLVED" section later in this doc for the full write-up.
Why this exists¶
The qualifications.packaging_rules column in rtopacks-db is supposed to contain the actual TGA packaging rules prose for each qualification — the "3 from Group A, 3 from Group A+B, open corpus fallback" text that the Studio canvas needs to construct real elective groups instead of the flat "Open electives — choose N" placeholder. Investigation triggered by STUDIO-CLUSTER-EVIDENCE / STUDIO-PACKAGING-PARSE design work revealed that the column actually contains a release modification history HTML table, not the packaging rules at all. The raw rules are in the API; we just weren't capturing them correctly. This document is the canonical record of what the TGA API actually returns, what we currently store, and what's misrouted or missing.
This file should be the first stop for any future brief that touches TGA data — packaging parser, version pinning, transition alerts, unit supersession, the Studio canvas header version pill, or the cross-org transition alert system.
TL;DR — answers to the eight TGA-AUDIT-01 questions¶
| # | Question | Answer |
|---|---|---|
| 1 | Is the actual packaging rules prose in the response? | Yes. Three calls deep — content bundle endpoint, item with contentTypeCode === "0116". |
| 2 | Are per-group unit tables in the response? | Yes, embedded in the same field as the prose. HTML-formatted but with predictable section headings ("Core units", "Group A — …", "Group B — …"). Not strict structured data; LLM-parseable. The XML asset adds nothing. |
| 3 | Is release number in the response? | Yes. Level 1: releases[].releaseNumber. Level 2: releaseNumber at root. |
| 4 | Is release date in the response? | Yes. Level 1: releases[].releaseDate. Level 2: releaseDate at root. Currently captured. |
| 5 | Is status (current / teach-out / superseded) in the response? | Yes, three places. Level 1: releases[].currency ("current" / etc.), usageRecommendation (string), usageRecommendationLabel (display form). Level 2: currency at root. |
| 6 | Is teach-out date in the response? | Effectively yes — releases[].currencyChangeDate at Level 1 / currencyChangeDate at Level 2 fires when the release transitions out of "current" status. Not labelled "teach out" but functionally identical. Need to confirm with a non-current qual that this field shifts to the supersession date for teach-out releases. |
| 7 | Is superseded-by (qual code) in the response? | Yes. Level 1: mappingInformation[] array with {code, mapsToCode, mapsToTitle, mapsToId, isEquivalent, date}. Direction needs verification against an unambiguous case (BSB40920's mapping points at BSB41515 with isEquivalent=true; need to confirm whether mapsToCode is the predecessor or successor). |
| 8 | Are unit-level supersession relationships in the response? | Not at the qual level. The qualification_units join table has superseded_by columns in our DB, suggesting per-unit supersession is fetched via the unit detail endpoint (/api/training/{unitCode} for each unit code). Not investigated in this audit. |
API endpoints used¶
The TGA public API (https://training.gov.au/api/...) is open — no auth, no API key. Every call below works with a bare fetch(url) in Node. curl from macOS hangs at the HTTP layer even though TLS handshake succeeds (confirmed reproducibly from both Tim's terminal and the Claude Code sandbox); this is curl-specific, not a network or geoblock issue. The browser also works fine. The tools/tga-enrich-quals.mjs script that successfully populated the corpus on 2026-03-03 used bare Node fetch(url).
There is also an undocumented Swagger spec at /swagger/v1/swagger.json (not yet captured for this audit — pending follow-up).
Level 1 — Training detail¶
Returns top-level qual metadata, releases array, parent training package, mapping (supersedes / superseded by) information, taxonomy (industry sectors + occupations), training package developer.
Does not contain: packaging rules prose, unit lists, content bundle ids, asset URLs.
Level 2 — Release detail¶
Where {releaseId} comes from releases[i].id in the Level 1 response (the UUID, not the release number — note that the existing tga-enrich-quals.mjs script uses the release number in the URL, which works because TGA accepts both forms, but the UUID is the canonical id).
Returns the structured packaging counts (packagingInformation: { core, elective, measure }), the content bundle ids needed for Level 3, asset URLs (XML / PDF / DOCX downloads of the complete document), external links (Companion Volume Implementation Guide), and the same currency / currencyChangeDate fields scoped to this specific release.
Does not contain: the actual packaging rules prose. Only the bundle id pointing at where the prose lives.
Level 3 — Content bundle¶
Where {bundleId} comes from contentBundles[0].id in the Level 2 response.
Returns an array of content items, each labelled with a contentTypeCode and a contentType string. This is where the actual rules live. For BSB40920 the bundle contains four items:
| contentTypeCode | contentType | title | length | What it is |
|---|---|---|---|---|
0012 |
ModificationHistory |
Modification history | 561 chars | Release-history HTML table |
0001 |
Description |
Qualification description | 786 chars | Job-role narrative blurb |
0110 |
EntryRequirements |
Entry requirements | 27 chars | "Nil" |
0116 |
PackagingRules |
Packaging rules | 6491 chars | The actual rules prose, with per-group unit tables |
Optional asset — Complete training package XML¶
For BSB40920 release 1: https://training.gov.au/assets/BSB/BSB40920_R1.xml (107,742 bytes). Also available as PDF (_R1.pdf, 205 KB) and DOCX (_R1.docx, 1.2 MB).
The XML is an AuthorIT CMS export wrapper, not a clean qualification schema. Inspected — contains only ~11 unique BSB unit codes (vs the 27 we expect across core + electives), confirming it embeds an HTML payload (likely the same 0116 content) inside one of its CMS object fields. Adds no usable structure beyond what the content bundle provides. Skip the XML for ingest. Use the content bundle.
Full BSB40920 Level 1 response — top-level field inventory¶
Every key in the Level 1 response, what it contains, and how the current ingest pipeline handles it.
| Field | Type | BSB40920 value | Stored where? | Status |
|---|---|---|---|---|
code |
string | "BSB40920" |
tga_training_components.code, qualifications.qual_code |
✓ correct |
id |
string (UUID) | "b77ac6e2-…" |
NOT STORED | ❌ missing — TGA's internal qual UUID, useful as a stable foreign key |
developmentStandard |
string | "streamline" |
NOT STORED | ❌ missing — distinguishes streamlined quals from legacy format |
type |
string | "qualification" |
tga_training_components.component_type (mapped) |
✓ correct |
title |
string | "Certificate IV in Project Management Practice" |
tga_training_components.title, qualifications.title |
✓ correct |
usageRecommendation |
string | "current" |
tga_training_components.status (via usageRecommendationLabel), qualifications.usage_recommendation |
✓ correct |
usageRecommendationLabel |
string | "Current" |
tga_training_components.status |
✓ correct |
parent |
object | {code:"BSB", id:"7e2b…", title:"Business Services Training Package"} |
tga_training_components.training_package_code (code), training_package_title (title) |
⚠ partial — parent.id (TP UUID) is dropped |
releases |
array of objects | 1 release | tga_training_components.release_count, latest_release_number, latest_release_date; release rows in tga_training_releases |
⚠ partial — see below |
releases[].id |
string (UUID) | "75b8289d-…" |
tga_training_releases.content_bundle_id (mislabelled — actually the release id, not the bundle id) |
⚠ misrouted |
releases[].releaseNumber |
string | "1" |
tga_training_releases.release_number |
✓ correct |
releases[].releaseDate |
string (ISO date) | "2020-10-19" |
tga_training_releases.release_date |
✓ correct |
releases[].currency |
string | "current" |
tga_training_releases.is_current (boolean derivation) |
⚠ lossy — non-current values (superseded, etc.) collapse to 0 |
releases[].currencyChangeDate |
string (ISO date) | "2020-10-19" |
NOT STORED | ❌ missing — this is the closest thing to a teach-out / status-change date |
releases[].links |
array | [] |
NOT STORED | (low value) |
mappingInformation |
array of objects | 1 mapping | tga_training_components.superseded_by_code, superseded_by_title |
⚠ direction unverified (see below) |
mappingInformation[].code |
string | "BSB40920" (current qual self-reference) |
NOT STORED | (redundant) |
mappingInformation[].mapsToCode |
string | "BSB41515" |
tga_training_components.superseded_by_code (extracted via filter m.mapsToCode !== code) |
⚠ unverified direction |
mappingInformation[].mapsToTitle |
string | "Certificate IV in Project Management Practice" |
tga_training_components.superseded_by_title |
⚠ unverified direction |
mappingInformation[].mapsToId |
string (UUID) | "2801418e-…" |
NOT STORED | ❌ missing |
mappingInformation[].isEquivalent |
boolean | true |
NOT STORED | ❌ missing — flags equivalent vs partial mappings, important for teach-out logic |
mappingInformation[].date |
string (ISO date) | "2020-10-18" |
NOT STORED | ❌ missing — when the supersession was registered |
taxonomy.industrySectors |
array of objects | 7 sectors | NOT STORED CORRECTLY | ❌ misroute (TYPO BUG) — see below |
taxonomy.industrySectors[].industrySector |
string | "Public Administration" (etc.) |
should be in field_of_education |
❌ tga-sync reads taxonomy.industry, which doesn't exist; gets null |
taxonomy.industrySectors[].industrySectorId |
int | 297 (etc.) |
NOT STORED | ❌ missing |
taxonomy.industrySectors[].description |
string | rich descriptions | NOT STORED | ❌ missing |
taxonomy.occupations |
array of objects | 5 occupations | NOT STORED CORRECTLY | ❌ misroute (TYPO BUG) — see below |
taxonomy.occupations[].occupation |
string | "Project administrator/coordinator" (etc.) |
should be JSON-encoded into anzsco_codes |
❌ tga-sync reads taxonomy.occupation, which doesn't exist; gets empty array |
taxonomy.occupations[].occupationId |
int | 3146 (etc.) |
NOT STORED | ❌ missing |
taxonomy.occupations[].description |
string | rich descriptions | NOT STORED | ❌ missing |
trainingPackageDeveloper |
object | {name, organisationId, webAddresses} |
NOT STORED | ❌ missing — useful for org transition alerts (e.g. "Future Skills Organisation released a new version of BSB") |
Confirmed bugs in tga-sync field mapping (Level 1)¶
Bug A — taxonomy.industry should be taxonomy.industrySectors (scripts/workers/tga-sync/src/index.ts line ~963):
const ind = typeof taxonomy === "object" ? taxonomy.industry || [] : [];
const foe = ind.length > 0 && ind[0]?.name ? ind[0].name : null;
The API field is industrySectors (plural, with the Sectors suffix). The script reads taxonomy.industry, which doesn't exist on the response object, so ind is always [] and foe is always null. Result: every qual in tga_training_components.field_of_education is null even though all the data is in the API. Also the per-item field is industrySector not name, so even if the array lookup were fixed the .name access would still fail.
Bug B — taxonomy.occupation should be taxonomy.occupations (same file, line ~959):
const occ = typeof taxonomy === "object" ? taxonomy.occupation || [] : [];
const anzsco = occ.length > 0 ? JSON.stringify(occ.map((t: any) => t?.code).filter(Boolean)) : null;
Same shape of bug. The API field is occupations (plural) and the per-item field is occupation, not code. Result: every qual in tga_training_components.anzsco_codes is null. Note: the field is named anzsco_codes but the API never returned ANZSCO codes — it returns TGA's own occupationId integers (3146, 3148, etc.). The column name is misleading even if the bug were fixed.
Both bugs have been silently wrong since tga-sync was first deployed. The data is recoverable by re-running tga-sync after the fix lands; no schema migration needed for field_of_education, but anzsco_codes should probably be renamed to tga_occupation_ids if we're going to capture them properly.
Direction of mappingInformation — RESOLVED 2026-04-16¶
TGA-MAPPING-DIRECTION-AUDIT ran against BSB41513 → BSB41515 → BSB40920 and returned an unambiguous answer. code is the successor, mapsToCode is the predecessor. Verified by fetching BSB41515's Level 1 detail, which contains TWO entries:
{code: BSB40920, mapsToCode: BSB41515, date: 2020-10-18}— BSB41515 appears asmapsToCode, meaning BSB40920 is its successor.{code: BSB41515, mapsToCode: BSB41513, date: 2015-03-24}— BSB41515 appears ascode, meaning BSB41513 is its predecessor.
For a qual with code X, the correct derivation is:
supersedes(what I replaced) =mappingInformationentries wherem.code === X, take{mapsToCode, mapsToTitle, mapsToId}.superseded_by(what replaced me) = entries wherem.mapsToCode === X, take{code, title, id}.
The search API exposes the same information pre-resolved as top-level supersedes and supersededBy arrays. Critically, the mappingInformation field does NOT exist on search API responses — which is why the original buildSupersededByMap() in tools/tga-enrich-quals.mjs silently produced an empty map, and why qualifications.superseded_by was NULL for the entire corpus until this audit. The search API path has been deleted; the enrich script now derives both directions from Level 1 mappingInformation directly.
Bugs caught by this audit and fixed:
qualifications.supersedeshad been storing the wrong field (m.code, which is often the qual itself in single-mapping responses, producing self-references likeBSB40920.supersedes = [BSB40920]). For superseded quals with two mapping entries, both entries were written, giving garbage likeBSB41515.supersedes = [BSB40920, BSB41515]when the correct answer is[BSB41513].qualifications.superseded_bywas NULL for every row in the corpus. ThebuildSupersededByMap()function walked a field that doesn't exist on search responses.- Both fixed by
deriveSupersession()in the enrich script + a full-corpus rerun via--supersession-fields. Canaries verified: BSB40920 → supersedes BSB41515, BSB41515 → supersedes BSB41513 + superseded_by BSB40920, BSB41513 → supersedes BSB41507 + superseded_by BSB41515.
Direction of mappingInformation — original (unverified) reasoning, kept for history¶
For BSB40920 the mapping object is:
{
"code": "BSB40920",
"mapsToCode": "BSB41515",
"mapsToTitle": "Certificate IV in Project Management Practice",
"isEquivalent": true,
"date": "2020-10-18"
}
Both qual codes share the same title, both are Certificate IV in Project Management Practice, and the date (2020-10-18) is the day before BSB40920's release date (2020-10-19). The naming convention strongly suggests BSB40920 (year 20) is newer than BSB41515 (year 15) — five years younger — meaning BSB40920 supersedes BSB41515. Under that reading, mapsToCode is the predecessor, not the successor.
The current tga-sync code stores mapsToCode in superseded_by_code. If mapsToCode is the predecessor, that field is mis-named — it should be supersedes_code. The opposite reading would make tga-sync correct but feels wrong given the dates and the naming convention.
This MUST be verified by inspecting a qualification with an unambiguous supersession trail before the canvas-side version pill goes live. Recommended verification: pick a known-superseded qual (e.g. BSB41515 itself, since it should now be the older one) and look at its Level 1 mapping. If BSB41515's mappingInformation points at BSB40920, that proves the direction is "this qual was replaced by THAT qual" and our current superseded_by_code interpretation is correct. If BSB41515 has no mapping or maps in a different direction, we have to reverse the column semantics.
Full BSB40920 Level 2 (release detail) inventory¶
| Field | Type | BSB40920 R1 value | Stored where? | Status |
|---|---|---|---|---|
id |
string (UUID) | "75b8289d-…" |
tga_training_releases.content_bundle_id (mislabelled) |
⚠ misrouted (this is the RELEASE id, not the content bundle id) |
releaseNumber |
string | "1" |
tga_training_releases.release_number |
✓ correct |
releaseDate |
string (ISO date) | "2020-10-19" |
tga_training_releases.release_date |
✓ correct |
currency |
string | "current" |
NOT STORED | ❌ missing at the release level (we only store the boolean is_current) |
currencyChangeDate |
string (ISO date) | "2020-10-19" |
NOT STORED | ❌ missing — teach-out / status-transition date |
packagingInformation.core |
int | 3 |
qualifications.core_units_count (via tga-enrich-quals) |
✓ correct (when enriched) |
packagingInformation.elective |
int | 6 |
qualifications.elective_units_count (via tga-enrich-quals) |
✓ correct (when enriched) |
packagingInformation.measure |
string | "units" |
NOT STORED | (low value — assume "units" until proven otherwise) |
contentBundles |
array of objects | 1 bundle | NOT STORED at the bundle id level; only the bundle's typeCode metadata flows through | ❌ missing — the bundle id is what we need to fetch the actual rules |
contentBundles[].id |
string (UUID) | "83cb4d85-…" |
NOT STORED | ❌ the gap — required for Level 3 |
contentBundles[].typeCode |
string | "0000" (Default) |
NOT STORED | (low value) |
contentBundles[].typeName |
string | "Default" |
NOT STORED | (low value) |
contentBundles[].links |
array | [{rel:"self", href:…}] |
NOT STORED | (low value, derivable from id) |
assets |
array of objects | 3 assets (XML, PDF, DOCX) | NOT STORED | ❌ missing — the XML/PDF/DOCX URLs are direct download links to the full training package documents, useful for compliance evidence and operator inspection |
assets[].name |
string | "BSB40920_R1.xml" (etc.) |
NOT STORED | |
assets[].url |
string | full URL | NOT STORED | |
assets[].size |
int | 107753 (etc.) |
NOT STORED | |
assets[].lastPublishedDate |
string (ISO datetime) | "2020-10-21T22:14:44…" |
NOT STORED | |
assets[].type |
string | "completeDocument" |
NOT STORED | |
assets[].isAvailable |
boolean | true |
NOT STORED | |
externalLinks |
array of objects | 1 link (VETNet Companion Volume Implementation Guide) | NOT STORED | ❌ missing — the Companion Volume is the official delivery guidance document, useful for the AuditView and for trainer-facing reference |
externalLinks[].title |
string | "Companion Volume Implementation Guide is found on VETNet" |
NOT STORED | |
externalLinks[].url |
string | "https://vetnet.gov.au/Pages/TrainingDocs.aspx?q=…" |
NOT STORED | |
externalLinks[].relatesTo |
string | "completeDocument" |
NOT STORED | |
links |
array | [{rel:"content-overview", href:…}] |
NOT STORED | (low value, derivable) |
specializations |
array | [] (BSB40920 has none) |
NOT STORED | ❌ missing — populated for quals with specialisation streams (e.g. Diploma of Nursing); needed for the Studio canvas to surface specialisation choices to the user |
Confirmed bug in tga-sync (Level 2 — release id misroute)¶
The tga_training_releases.content_bundle_id column actually stores releases[].id (the release UUID), not the content bundle id from the Level 2 response. Re-reading the schema migrations and tga-sync upsert code:
// tga-sync, line ~1014 — upserts releases from Level 1 detail
.bind(
code,
rel.releaseNumber || "",
rel.releaseDate || null,
rel.title || rel.releaseTitle || null,
rel.currency === "current" ? 1 : 0,
rel.id || null, // ← stored as content_bundle_id
now(),
)
rel.id here is the release UUID from Level 1, not the content bundle id. The column is mislabelled. The actual content bundle id is only available after a Level 2 fetch, which tga-sync doesn't perform. The fix is either to rename the column to release_id (correct semantics) or to add a Level 2 fetch step that populates a new content_bundle_id column.
Content bundle (Level 3) — the misroute that triggered this audit¶
tga-enrich-quals.mjs correctly maps contentTypeCode === "0116" → qualifications.packaging_rules in its current source (line 251–253). Yet qualifications.packaging_rules for BSB40920 contains the 561-character 0012 ModificationHistory content, not the 6491-character 0116 PackagingRules content. The historical write that landed in the column does not match what the current code would produce against the current API.
Possible explanations (none verified in this audit):
- An earlier version of tga-enrich-quals.mjs had a different typeCode mapping
- A separate enrichment script populated packaging_rules before tga-enrich-quals existed
- The TGA API renumbered its contentTypeCode taxonomy between the original ingest and now
- A bulk-load SQL script wrote release notes to packaging_rules directly
The pragmatic fix is to re-run tga-enrich-quals.mjs with the current code. It will pull 0116 from the bundle and overwrite the column with the correct content. This needs to happen against the entire qualifications table (~all enriched rows), not just BSB40920 — every other qual in the corpus is likely affected by the same historical bug.
What the real packaging rules content looks like¶
The 0116 content for BSB40920 is HTML, but with predictable structure that an LLM (or even a careful regex/cheerio parser) can extract reliably. Stripped of HTML tags, the content reads:
Total number of units = 9
3 core units plus 6 elective units, of which: - 3 elective units must be selected from Group A - for the remaining 3 elective units: up to 3 units may be selected from Groups A and B; if not listed, up to 3 units may be selected from a Certificate IV or higher from this or any other currently endorsed Training Package qualification or accredited course.
Elective units must be relevant to the work environment and the qualification, maintain the integrity of the AQF alignment and contribute to a valid, industry-supported vocational outcome.
Chosen elective units must not include BSBPMG430 Undertake project work.
Core units - BSBPMG420 Apply project scope management techniques - BSBPMG421 Apply project time management techniques - BSBPMG422 Apply project quality management techniques
Elective units
Group A — Project Management - BSBPMG423 Apply project cost management techniques - BSBPMG424 Apply project human resources management approaches - BSBPMG425 Apply project information management and communications techniques - BSBPMG426 Apply project risk management techniques - BSBPMG427 Apply project procurement procedures - BSBPMG428 Apply project life cycle management processes - BSBPMG429 Apply project stakeholder engagement techniques
Group B — Transferable Skills - BSBCRT411 Apply critical thinking to work practices - BSBLDR413 Lead effective workplace relationships - BSBLEG522 Apply legal principles in contract law matters - BSBOPS401 Coordinate business resources - BSBPEF401 Manage personal health and wellbeing - BSBPEF402 Develop personal work priorities - BSBSUS411 Implement and monitor environmentally sustainable work practices - BSBTEC403 Apply digital solutions to work processes - BSBTEC404 Use digital technologies to collaborate in a work environment - BSBWHS411 Implement and monitor WHS policies, procedures and programs - BSBXCS401 Maintain security of digital devices - CPPDSM4047 Implement and monitor procurement process - MSMENV472 Implement and monitor environmentally sustainable work practices - PSPETH002 Uphold and support the values and principles of public service - PSPGEN043 Apply government processes - PSPPCY004 Support policy implementation - TLIE4006 Collect, analyse and present workplace data and information
7 + 17 = 24 elective units, which exactly matches the elective_units_count = 24 we already have in qualifications. Three core + 24 elective = 27 distinct units, and the rule "choose 3 + 3 + 3" = 9 required. All counts reconcile.
This content is sufficient to drive a structured packaging rules parser. It contains:
- The rule structure (core count, elective count, group choice rules, conditional fallbacks)
- The complete unit list per group (with codes AND titles)
- Exclusion rules ("must not include BSBPMG430")
- Cross-package open-corpus fallback constraints ("Cert IV or higher")
- General quality criteria ("relevant to the work environment", AQF alignment)
Required tga-sync and ingest changes¶
Categorised by severity. All changes are tracked here as recommendations; the actual fixes will land in follow-up briefs (TGA-SYNC-FIX-01, STUDIO-PACKAGING-PARSE-01).
Critical — historical data wrong¶
- Re-run
tga-enrich-quals.mjsagainst the entirequalificationscorpus to overwrite the wrongpackaging_rulescontent. The current script source is correct; the historical data is not. Do this BEFORE building any packaging parser, or every parsed result will be wrong because it's parsing modification history instead of rules. Risk: the script writes SQL to a file (SQL_OUTPUT), it doesn't directly mutate the DB; the SQL still has to be executed against rtopacks-db. Use a small batch first (10 quals) to verify the new content lands correctly before running the full corpus.
Critical — silently-broken field mappings¶
-
Fix
taxonomy.industry→taxonomy.industrySectorsinscripts/workers/tga-sync/src/index.ts(~line 963). Also fix the per-item.nameaccess to read.industrySector. Re-runtga-syncto backfillfield_of_educationfor every existing row. -
Fix
taxonomy.occupation→taxonomy.occupationsin the same file (~line 959). Per-item field isoccupation, notcode. The column is misleadingly namedanzsco_codesbut the API never returned ANZSCO codes; consider renaming totga_occupation_idsin a follow-up migration. Re-runtga-syncto backfill.
Important — version pinning data missing¶
-
Capture
releases[].currencyChangeDatein a new column ontga_training_releases(e.g.currency_change_date). This is the closest TGA gives us to a teach-out / status-transition date and is essential for the Studio canvas version pill, transition alerts, and audit-pinning logic. -
Capture
releases[].currencyas a string, not just the derived booleanis_current. Rename or addcurrency_statusso we can distinguishcurrent/superseded/expiredetc. The booleanis_currentis lossy. -
Rename
tga_training_releases.content_bundle_idtorelease_id— it currently stores the release UUID, not a content bundle id (which only comes from a Level 2 fetch we don't perform). Add a separatecontent_bundle_idcolumn if/when we add the Level 2 fetch. -
Capture
mappingInformation[].mapsToId,isEquivalent, anddateontga_training_components. These are needed to render rich supersession context in the Studio canvas (e.g. "BSB41515 was replaced by BSB40920 on 2020-10-18 — equivalent transfer"). -
Verify the direction of
mapsToCodebefore relying onsuperseded_by_code. If wrong, swap the column semantics. Pick BSB41515 (the apparent predecessor), inspect its Level 1 mapping, and confirm.
Useful — currently dropped data¶
-
Capture
id(qual UUID),parent.id(TP UUID),developmentStandardontga_training_components. -
Capture
trainingPackageDeveloper.nameandorganisationIdontga_training_components(or a newtga_training_package_developerstable). Useful for org-level transition alerts. -
Add a Level 2 fetch step to
tga-syncso we capturepackagingInformation,assets[],externalLinks[], andcontentBundles[].id. Currentlytga-syncnever does the Level 2 call — onlytga-enrich-quals.mjsdoes, and only as a one-shot tool, not as part of the scheduled sync. The Level 2 fetch should populate a newtga_training_releases_detailtable or extend the existingtga_training_releasesrow. -
Capture
assets[]URLs — the XML / PDF / DOCX direct download links for the complete training package document. Useful for operator inspection and for ASQA-style compliance evidence ("here is the exact document this session was built against on this date"). -
Capture
externalLinks[]— the Companion Volume Implementation Guide URL is the operator-facing reference document. Useful for the Studio AuditView and for trainer onboarding. -
Capture
specializations[]at the release level. BSB40920 has none, but quals like Diploma of Nursing have multi-stream specialisations that the Studio canvas needs to surface to the user.
Lower priority — taxonomy enrichment¶
- Capture
taxonomy.industrySectors[].industrySectorId,descriptionandtaxonomy.occupations[].occupationId,descriptionfor richer search and filtering. Probably belongs in dedicatedtga_industry_sectorsandtga_occupationsreference tables joined to qualifications via a junction.
Architectural implications for version pinning¶
The brief notes that qualification versioning is a day-one concern. This audit confirms the data we need is in the API; it's just not being captured. The minimum viable schema markers for version pinning are:
qualifications.qualification_release— the release number / id this row reflects. Compound key(qual_code, release)for anything version-sensitive. Currently we haverelease_numberon the per-release table butqualificationsis single-row-per-code.studio_sessions.qualification_release— immutable once set. The release the session was built against. Pinned to whatever was current at session creation time. Needed for ASQA pinning.tga_training_releases.currency_change_date— the teach-out / supersession date for that release.tga_training_releases.currency_status— stringcurrent/superseded/expired, not just a boolean.tga_training_components.superseded_by_code/superseded_by_id— direction verified.
tga-sync must NEVER overwrite historical release rows. When TGA introduces a new release for an existing qual, the new row is added but the old release row stays as-is, with its currency_status flipped from current → superseded and currency_change_date populated. That's the data shape that lets the Studio canvas tell the user "this session was built against Release 1 on 2020-10-19; Release 2 dropped on 2025-08-15 and is now current — your enrolments started before that date are still valid against Release 1."
Unit-level supersession (Q8) is a separate concern — needs investigation against the unit detail endpoint (/api/training/{unitCode}), out of scope for this audit.
Follow-up work (next briefs)¶
In dependency order. Shipped items crossed out — see the "2026-04-16 update" section at the top of this doc for what actually landed and what was corrected.
- ~~TGA-SYNC-FIX-01~~ ✅ shipped 2026-04-15 — typos fixed in
scripts/workers/tga-sync/src/index.ts, full corpus re-enriched viatga-enrich-quals.mjs --force,field_of_educationandanzsco_codesbackfilled. - ~~TGA-ENRICH-RERUN-01~~ ✅ folded into TGA-SYNC-FIX-01 as a single corpus rerun.
packaging_rulesnow contains the correct0116content for all 5202 enriched quals (2805 superseded/deleted unchanged where TGA returned no bundle). - ~~TGA-VERSION-SCHEMA-01~~ ✅ shipped 2026-04-15/16 — but not with the scope described in items 4, 5, 6 above. See retraction in the "2026-04-16 update" section. Actual landed columns:
qualifications.release_id,content_bundle_id,teach_out_date;studio_sessions.qualification_release+qualification_release_date. Thecurrency/currency_change_datecolumns were added then dropped same day. - STUDIO-PACKAGING-PARSE-01 — now unblocked. Reads the corrected
qualifications.packaging_rules(0116content), parses via LLM with tool use, writes structured JSON to a newqualification_packaging_rulestable. Brief drafted in session 60; data foundation is ready. - ~~TGA-MAPPING-DIRECTION-AUDIT~~ ✅ resolved 2026-04-16 —
codeis the successor,mapsToCodeis the predecessor. Two script bugs found and fixed (supersedesstored the wrong field;superseded_bywas NULL corpus-wide becausebuildSupersededByMapwalked a field that doesn't exist on search API responses). Full corpus rewritten via--supersession-fields. See the "Direction of mappingInformation — RESOLVED" section above for the full write-up. STUDIO-VERSION-PILL-01 is now unblocked for surfacing supersession. - TGA-UNIT-SUPERSESSION-AUDIT — equivalent of this audit but for the unit detail endpoint. Needed for the unit-level supersession story.
- STUDIO-VERSION-PILL-01 — now unblocked. Studio canvas header chip comparing
studio_sessions.qualification_releasetoqualifications.release_number. Cheap, high-value. - TGA-TRANSITION-ALERTS-01 — when TGA drops a new release for a qual on an org's active scope, surface a notification. Depends on the version data being captured correctly.
Investigation artefacts¶
The raw responses captured for this audit are saved locally at:
/tmp/tga-BSB40920-detail.json(Level 1, 6190 bytes)/tmp/tga-BSB40920-release-1.json(Level 2, 1510 bytes)/tmp/tga-BSB40920-bundle.json(Level 3, 9460 bytes)/tmp/tga-BSB40920_R1.xml(XML asset, 107742 bytes)
These are temporary files on the dev machine, not committed to the repo. Re-fetch with the Node one-liner in the "Investigation method" section below if needed.
Investigation method¶
node -e '
(async () => {
const code = "BSB40920";
const detail = await fetch(`https://training.gov.au/api/training/${code}`).then(r => r.json());
const releaseId = detail.releases?.[0]?.id;
const release = await fetch(`https://training.gov.au/api/training/${code}/releases/${releaseId}`).then(r => r.json());
const bundleId = release.contentBundles?.[0]?.id;
const bundle = await fetch(`https://training.gov.au/api/content/bundle/${bundleId}`).then(r => r.json());
console.log(JSON.stringify({ detail, release, bundle }, null, 2));
})();
'
curl from macOS does not work for these endpoints (TLS handshake passes via openssl s_client but HTTP-layer hangs). Node fetch works reliably with no headers. The browser also works. Use Node fetch for any future TGA inspection.
Cross-cutting connectivity gotchas live in
ops/source-of-truth-connectivity.md. That doc is the canonical first-stop for any "why isn't this endpoint responding" debugging — this section below is the TGA-specific deep dive behind the summary row there.
D1 bulk write lessons (TGA-SYNC-FIX-01, 2026-04-15)¶
Recorded after a full-corpus rerun of tools/tga-enrich-quals.mjs burned ~90 minutes hitting a cascade of write-layer failures. Every item below is a concrete gotcha — read this before any future TGA enrichment or bulk D1 UPDATE job.
1. curl to training.gov.au hangs on macOS¶
TLS handshake succeeds via openssl s_client, but curl (and any tool shelling out to it) hangs at the HTTP layer with no response and no timeout. Node's global fetch works reliably with no special headers. Never diagnose TGA connectivity with curl — jump straight to node -e 'await fetch(...)'.
2. wrangler d1 execute --remote --file is broken for real-size batches¶
Both wrangler@4.78.0 and wrangler@4.83.0 fail with Network connection lost during the async pollUntilComplete path the --file flag uses. A trivial one-line SELECT also fails in some cases. The issue is not payload size — it's the wrangler→D1 polling API. Do not use --file for bulk UPDATEs.
Use the CF D1 HTTP API directly (POST /accounts/{id}/d1/database/{db}/query) via Node fetch. Synchronous, no polling, no wrangler shell-out. The enrich script already does this — see executeOne / executeBatch in tools/tga-enrich-quals.mjs.
3. Multi-statement payloads fail silently on real content¶
Joining N UPDATE statements with ; and POSTing as one sql string fails with confusing errors (SQL code did not contain a statement, Network connection lost) once the batch grows beyond trivial size, even though single statements work fine. Two causes combined:
toUpdateSql()already terminates with;; the join added a second;;. Strip trailing semicolons with.replace(/;+\s*$/, "")before sending.- Even after that, multi-statement batches with 100+ real-content UPDATEs are unreliable.
Send one statement per request with bounded concurrency. The script uses D1_CONCURRENCY = 10 via Promise.allSettled. Slightly more HTTP overhead, but reliable and error-attributable per-statement.
4. SQLITE_TOOBIG on inline large-string UPDATEs¶
A single UPDATE with an inline packaging_rules = '...' literal fails with SQLITE_TOOBIG when the escaped string exceeds roughly 100KB. This affects ~10 Current quals today (large training packages: MEM, MEA, RII, UEP, MSA, AMP) and 48 Superseded/Deleted quals. Symptoms:
- Statement containing
packaging_rulesinline → fails above ~100KB. - Same update sent with
packaging_rulesas a bound parameter (?1) → works fine at 187KB+.
For any column that might hold >50KB of text, use parameter binding:
await d1Query(
`UPDATE qualifications SET packaging_rules = ?1 WHERE qual_code = ?2`,
[packagingRulesString, qualCode]
);
The inline-literal path is fine for every other column (descriptions, titles, taxonomy JSON). packaging_rules is the outlier — some training packages ship 100KB+ of HTML-formatted unit tables.
5. Split oversized UPDATEs into logical halves¶
Even with bound parameters, combining every field into one UPDATE pushes the SQL text past the limit when description is also large. Split pattern:
- UPDATE all small scalar fields (title, units, usage, description, taxonomy, etc.) in one statement.
- UPDATE
packaging_rulesalone in a second statement, via bound parameter.
This is what tools/fix-10-stragglers.mjs-style one-off jobs should do for any qual that hit SQLITE_TOOBIG in the main rerun. For future enrich script runs, consider building this split into toUpdateSql by default.
6. Distinguishing updated vs not-updated rows after a partial rerun¶
The cleanest discriminator is enriched_at. After a rerun dated YYYY-MM-DD, any row whose substr(enriched_at,1,10) is older than that date was not successfully written in this pass. Do not try to detect stragglers via content-shape heuristics (HTML tags in packaging_rules, etc.) — real content legitimately contains <p>, <table>, and friends, so you will get thousands of false positives.
7. Stale script paths after monorepo moves¶
RTOPACKS_SITE / WRANGLER_CWD in tools/tga-enrich-quals.mjs was pointing at repo/site which hadn't existed since the session-58 monorepo reorg. Repointed to workers/internal-api. If a legacy tool script starts failing on cwd/wrangler calls, first check whether the path it resolves still exists.
Final rerun outcome (2026-04-15)¶
- 8007 quals total
- 5202 ok (full enrichment)
- 2805 partial (superseded/deleted with incomplete TGA data — expected)
- 0 fetch-level failures
- 58 initial D1 write failures (SQLITE_TOOBIG on oversized
packaging_rules) - 10 Current stragglers → fixed via
/tmp/fix-10-stragglers.mjsusing split UPDATEs with bound-parameterpackaging_rules - 48 Superseded/Deleted stragglers → accepted as-is; their historical release-notes
packaging_rulesremain in place. Not worth rebuilding the enrichment mechanism for superseded qualifications.
End of TGA-AUDIT-01 inventory. Update this file when the API shape changes, when ingest bugs are fixed, or when new fields become relevant to the platform.