Skip to content

Archived Brief

This brief has been completed and is retained as a build record.

B-ENRICH-03 — ANZSCO Bulk Ingest (Qual→Occupation Mapping)

SURFACE: rtopacks-db D1 — no public-facing changes, data layer only
CLOUDFLARE ACCOUNT: e5a9830215a8d88961dc6c80a8c7442a (UCCA — do not touch UCCO)
DO NOT TOUCH: All Workers, public surfaces, engine-db, UCCO account
DB: rtopacks-db D1 334ac8fb-9850-48c0-9da0-b56c55640e98
BRIEF DRIP RULE: Deploy and confirm before requesting next brief.


Why this is the highest priority enrichment item

Current state: 8,007 qualifications, 0 ANZSCO codes (PSP50922 manually patched as test, 8,006 NULL).

Three data streams are fully ingested and wired but can only resolve for quals with an ANZSCO code: - OSL ratings — 9,454 rows, 1,930 distinct ANZSCO codes (shortage occupation flags) - IVI vacancies — 37,908 rows, 351 distinct ANZSCO codes (live vacancy counts + sparklines) - Employment projections — 358 rows (projections to 2035)

Fixing ANZSCO coverage is a single script that immediately activates three data streams across ~4,000+ quals (the subset with current TGA ANZSCO assignments).


DB schema

Table: qualifications
Existing columns: anzsco_code TEXT (currently NULL for 8,006 rows), anzsco_title TEXT (currently NULL)
No schema changes needed — columns already exist.


Data source

Primary source: TGA SOAP web service — the qual→ANZSCO mapping is stored on training.gov.au and served via their free web service. Public sandbox credentials require no registration:

Sandbox endpoints:

TrainingComponentService: https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices/TrainingComponentService.svc
ClassificationService:    https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices/ClassificationService.svc
OrganisationService:      https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices/OrganisationService.svc

Sandbox credentials (public, read-only):

Username: WebService.Read   (case sensitive)
Password: Asdf098           (case sensitive)

WSDLs and SDK available at: https://ws.sandbox.training.gov.au/webservices.html

Production: Once sandbox is tested, email TGA at tgahelp@education.gov.au with Organisation Name, Contact Person, and a generic email address — they provision a production read-only account. No approval process implied, straightforward.

Note on API future: TGA has flagged new RESTful APIs are planned via the National Training Register Enhancement Project. Build against SOAP now — it works today. Note the migration path for when REST lands.

First step — check existing data before hitting the API: The original TGA corpus ingest may have already fetched qualification detail objects that contain the ANZSCO field but simply didn't write it to the DB. Check existing ingest scripts and any cached response data on disk first. If it's already there, this is a re-parse job not an API job.

NCVER ANZSCO lookup file (for title enrichment, not the qual→code mapping):

https://www.ncver.edu.au/rto-hub/statistical-standard-software/anzsco-identifier-2022-release-occupation
Use this to validate/enrich anzsco_title values once codes are populated from TGA.


The ingest script

Write scripts/ingest-anzsco.mjs — local script, one-time run.

Logic: 1. Query rtopacks-db for all qual_code values where anzsco_code IS NULL (should be ~8,006 rows) 2. For each qual code, fetch the qualification detail from TGA API — extract anzsco_code and anzsco_title fields 3. Batch UPDATE qualifications SET anzsco_code = ?, anzsco_title = ? WHERE qual_code = ? 4. Log: total processed, total matched (got an ANZSCO code), total unmatched (TGA returned null/none), total errors 5. On completion: report coverage — how many of 8,007 now have an ANZSCO code

Rate limiting: TGA API is not fast. Batch in groups of 50, add 200ms delay between batches. Expected runtime: ~30 minutes for full corpus. Run overnight or let it run.

ANZSCO version tracking: TGA currently assigns ANZSCO 2022 codes. Store the version so we can migrate when OSCA 2024 replaces ANZSCO. Add a column if needed:

ALTER TABLE qualifications ADD COLUMN anzsco_version TEXT DEFAULT 'ANZSCO_2022';
Update this column during ingest. When OSCA transition happens, we add osca_code and osca_title columns alongside — no destructive migration.


Expected outcomes

After successful ingest, on PSP50922 and any qual with an ANZSCO match: - Employment tab shows IVI vacancy count + 12-month sparkline - Employment tab shows +X% to 2030, +Y% to 2035 projections - OSL shortage badge fires correctly where applicable

Coverage expectation: NCVER assigns ANZSCO to current training package quals. Superseded/legacy quals may have NULL or no assignment — that's expected and fine. Target ~4,000–5,000 quals getting codes (current quals only).


Ops stub

No new Worker deployed — this is a local script. No ops stub required.
Do log the ingest run result as a comment in ops.rtopacks.com.au under Data Enrichment section — timestamp, rows updated, coverage %.


Deploy checklist

  • Script written: scripts/ingest-anzsco.mjs
  • Tested against 10 quals in dry-run mode (log what would be written, don't write)
  • Full run completed — log total matched/unmatched
  • Verify on PSP50922: SELECT anzsco_code, anzsco_title FROM qualifications WHERE qual_code = 'PSP50922' — should still show 2643 / Interpreters and Translators (manually patched, don't overwrite unless TGA returns same or better)
  • Verify spot-check 5 other quals have codes populated
  • Report coverage: X of 8,007 quals now have ANZSCO codes
  • anzsco_version column added and populated
  • Ops note logged