Skip to content

RTOpacks Data Sources

This is the authoritative reference for every data source that feeds rtopacks-db. It covers what the data is, where it comes from, how it was ingested, which tables it populates, current row counts, and what the update cadence and method should be.

Last audited: 29 March 2026
Database: rtopacks-db (D1: 334ac8fb-9850-48c0-9da0-b56c55640e98)
Cloudflare account: e5a9830215a8d88961dc6c80a8c7442a
Provenance table: data_sources (all sources should be registered here)


Quick Reference

# Source Tables Rows Registered Method Endpoint / URL Cron Frequency
1 National Training Register (TGA) — adapter qualifications, units, qualification_units, rtos, rto_scope + 5 more ~490,000 REST API training.gov.au/api tga-sync · 0 16 * * SAT Weekly
1a TGA REST API — Organisation deep data tga_organisations, tga_org_legal_names, tga_org_contacts, tga_org_addresses, tga_org_classifications, tga_org_trading_names, tga_org_registration_managers, tga_org_regulatory_decisions, tga_org_restrictions, tga_org_registration_history 530,000+ REST API training.gov.au/api/organisation/{code}/* tga-sync · 0 16 * * SAT Weekly
1b TGA REST API — Training component deep data tga_training_components, tga_training_releases, tga_training_classifications, tga_training_taxonomy_occupations, tga_training_taxonomy_sectors, tga_training_prerequisites, tga_unit_grid_usage, tga_release_unit_grids, tga_tp_release_components, tga_api_metadata, tga_nrt_classification_values, tga_rto_classification_values 1,000,000+ REST API training.gov.au/api/training/{code}/* tga-sync · 0 16 * * SAT Weekly
1c TGA REST API — Delivery notification history tga_delivery_notification_history 793,000+ REST API training.gov.au/api/organisation/{code}/deliverynotificationhistory/{trainingcode} tga-sync · 0 16 * * SAT Weekly
2 yourcareer.gov.au — Search API qualifications (enrichment columns) 7,054 quals REST API api.yourcareer.gov.au/api/Courses/search Manual · Annual
3 yourcareer.gov.au — Pathways API qual_career_pathways, qualification_specialisations 54,640 + 967 REST API api.yourcareer.gov.au/api/Courses/{code}/PathwaysIndustry Manual · Annual
4 CRICOS cricos_providers, cricos_courses, cricos_institutions, cricos_locations, cricos_course_locations 79,117 CSV download data.gov.au dataset e5ae7059-bfa8-4fa4-a5c0-c13cf3520193 (4 stable resource IDs) cricos-sync · 0 18 1 * * Monthly · 1st
5 NCVER VOCSTATS vocstats_* (8 tables) ~9,200 ⚠️ Manual (web query tool) ncver.edu.au/vocstats Manual · Annual
6 JSA VNDA — Graduate Outcomes recon_vnda, recon_vnda_aqf, recon_vnda_foe 513 ⚠️ REST API jobsandskills.gov.au/api/v1/opensearch/vnda/_search Manual · When JSA updates
7 JSA VNDA — Export vnda_atlas 785 ⚠️ DEPRECATED
8 JSA Occupation Shortage List (OSL) osl_ratings 9,454 ⚠️ Excel download jobsandskills.gov.au/data/occupation-shortage-list Brief #25 Worker Annual
9 JSA Internet Vacancy Index (IVI) ivi_vacancies 37,908 ⚠️ Excel download jobsandskills.gov.au/system/files/YYYY-MM/Internet Vacancies...xls Brief #25 Worker · 0 2 1 * * Monthly
10 JSA Employment Projections emp_projections 358 ⚠️ Excel download jobsandskills.gov.au/data/employment-projections Brief #25 Worker Annual
11 JSA GLMD — Regional Labour Market glmd_regional 96 ⚠️ REST API (JSON file) jobsandskills.gov.au/system/files/datasets/glmd (YYYY-MM).json Brief #25 Worker · 0 2 1 * * Monthly
12 JSA Training API — Per-qual enrolments jsa_qual_training (new) 0 — Brief #25 ⚠️ REST API jobsandskills.gov.au/api/v1/opensearch/training/_search Brief #25 Worker · 0 2 1 7 * Annual · July
13 NSW Smart and Skilled state_funding (NSW) 1,164 Excel download nsw.gov.au/education-and-training/vocational/nsw-skills-list Manual · Annual
14 QLD Career Start / QSTL state_funding (QLD) 459 PDF parse dtet.qld.gov.au/training/providers/funded/subsidised-training-list Manual · Annual
15 VIC Free TAFE / Skills First state_funding (VIC) 7 ⚠️ incomplete ⚠️ Excel download vic.gov.au/free-tafe-courses Manual · Annual
16 SA WorkReady state_funding (SA) 720 PDF parse providers.skills.sa.gov.au/subsidised-training-list Manual · Annual
17 WA Jobs and Skills WA state_funding (WA) 175 HTML scrape (Puppeteer) jobsandskills.wa.gov.au/course-list Manual · Annual
18 TAS Skills Tasmania state_funding (TAS) 420 Excel download skills.tas.gov.au/providers/rto/courses_approved_and_funded_in_tasmania Manual · Annual
19 NT Fee-Free TAFE / CDU state_funding (NT) 95 HTML scrape cdu.edu.au/courses?type=vet Manual · Annual
20 ACT Skills state_funding (ACT) 0 — not yet ingested
21 ABR Reference Codes abr_codes 167 ⚠️ One-off ABR Rarely
22 Unit Stats (derived) units (computed columns) 15,200 Derived Computed from DB After TGA ingest

Method legend: - REST API — direct programmatic API call, JSON response - CSV download — direct CSV file download from stable URL - Excel download — .xlsx file download, requires xlsx parser - PDF parse — PDF text extraction via pdftotext + regex - HTML scrape — rendered HTML scrape (Puppeteer or requests) - Manual — requires human to log in or navigate a web tool - Derived — computed from data already in rtopacks-db - DEPRECATED — do not use

Cron schedule notation (AEST): - 0 16 * * SAT → Sunday 2am AEST (weekly) - 0 18 1 * * → 1st of month, 4am AEST (monthly) - 0 2 1 * * → 1st of month, 12pm AEST (monthly) - 0 2 1 7 * → 1 July annually, 12pm AEST

Sources marked ⚠️ will be registered in data_sources as part of Brief #25.


1. National Training Register (TGA)

data_sources key: tga_corpus
Authority: Department of Employment and Workplace Relations
URL: https://training.gov.au
Licence: Creative Commons Attribution 4.0
Auth: None required
Last ingested: Not recorded in data_sources — TGA ingest predates the provenance registry

What it is

The authoritative Australian register of all vocational education and training. Contains every qualification, unit of competency, skill set, and accredited course across all statuses (current, superseded, expired), plus every registered RTO and their approved delivery scope. This is the foundational dataset — everything else in the database enriches on top of it.

How it was ingested

TGA adapter ingest via the UCCA engine's adapters/tga/ module. The engine processes TGA's public data endpoints, applies the TGA adapter, and populates the corpus tables in rtopacks-db. This is a world-level ingest — the AU-VET world's foundational data.

Tables populated

Table Rows Content
qualifications 8,007 All qualifications — code, title, AQF level, training package, packaging rules, supersession chain (supersedes, superseded_by), status, entry requirements, description
units 75,189 All units of competency — code, title, description, elements, performance criteria, knowledge evidence, performance evidence, assessment conditions, foundation skills, training package, status
qualification_units 244,874 Junction table — which units belong to which qualifications, core vs elective flags
rtos 12,515 All registered training organisations — code, legal name, ABN, status, type
rto_scope 150,000 RTO delivery scope — which RTOs are approved to deliver which qualifications, in which states
rto_addresses 1,064 RTO physical and postal addresses
rto_contacts 2,176 RTO contact details (email, phone, fax)
rto_legal_names 139 RTO legal name history
rto_trading_names 137 RTO trading names
rto_registrations 267 RTO registration periods, regulators (ASQA / state)
rto_web_addresses ~137 RTO website URLs
rto_classifications ~267 RTO type classifications

Update cadence and method

TGA updates continuously as training packages are endorsed, superseded, and as RTOs register/deregister. Re-run the TGA adapter ingest quarterly, or when a major training package update is announced by a relevant Industry Reference Committee (IRC). After each TGA ingest, also re-run the unit_stats aggregate (source #22).


1a. TGA REST API — Organisation Deep Data

data_sources key: tga_rest_organisations Authority: Department of Employment and Workplace Relations API Base URL: https://training.gov.au/api Swagger: https://training.gov.au/swagger/Organisation%20-%20v1/swagger.json Auth: None required — unauthenticated public REST API (undocumented, backs the training.gov.au SPA) Ingested: 28–29 March 2026 | DATA-11a, DATA-10b Scripts: scripts/ingest/ingest_tga_organisations.py, scripts/ingest/ingest_tga_org_deep.py

What it is

The TGA REST API is an undocumented, unauthenticated REST API discovered by reverse-engineering the training.gov.au SPA in Session 12 (28 March 2026). It exposes the full NTR dataset via clean RESTful endpoints with no auth, no rate limiting, and no CORS restriction. The Organisation group of endpoints exposes historical and structured data about every registered and previously registered training organisation — far richer than what the TGA adapter ingest (#1) provides.

Key discovery: The API has been running silently behind the SPA, never publicly documented or indexed. Hosted on DEWR's own AS9509 infrastructure in Sydney. Zero rate limiting observed across 500,000+ API calls during the Session 12 overnight run. The full Swagger spec is available at https://training.gov.au/swagger/Organisation%20-%20v1/swagger.json.

Ingest strategy

Two-pass ingest: - DATA-10b — base organisation snapshot per org code: registration status, current address, lat/lng, CEO contacts, regulatory decisions, restrictions, registration history - DATA-11a — deep historical pass per org code: legal name history (with ABN arrays), trading name history, address history (all types), full contacts, classification (RTO type), CRICOS code history, registration manager history, scope summary counts

Tables populated

Table Rows Content
tga_organisations 12,500 Base snapshot — registration status, address, lat/lng, CEO, is_restricted, is_cricos, delivery_states
tga_org_registration_history 28,830 Registration period history — start, end, expiry, renewal decision
tga_org_regulatory_decisions 1,415 ASQA enforcement decisions — decision type, level, legislation, effective/expiry dates
tga_org_restrictions 804 Active and historical scope/registration restrictions
tga_org_legal_names 17,316 Legal name history with ABN array and ACN per name period — phoenix pattern key
tga_org_trading_names 13,239 Trading name history with date ranges
tga_org_addresses 63,159 Full address history typed by role (postal/principal/headOffice/deliveryLocation)
tga_org_contacts 193,121 Full contact records — CEO, public enquiries, registration enquiries, all roles
tga_org_classifications 13,093 RTO type classifications (private/TAFE/enterprise/community/school/university)
tga_org_registration_managers 16,850 ASQA vs state regulator history with legal authority and exerciser
tga_org_cricos_codes 0 CRICOS code history — endpoint returned empty for all orgs (TGA does not hold CRICOS code history; use cricos_* tables from source #4)

Intelligence views built on this data

View Rows Purpose
vw_rto_intelligence 12,515 Flagship RTO view — combines registration, scope, regulatory history, restriction status
vw_rto_risk_profile 937 RTOs with regulatory decisions or active restrictions — the moat view

Strategic significance

  • 193,121 contact records — first time CEO/contact data is structured and queryable across all 12,500 orgs
  • 17,316 legal name records with ABN arrays — enables CEO phoenix pattern analysis: directors who closed one RTO and opened another under different names
  • 937 RTOs with regulatory history — ASQA enforcement actions structured outside government for the first time
  • 420 currently active RTOs have regulatory decisions against them — queryable, filterable, commercially differentiating

Update cadence

Weekly via tga-sync Worker (cron 0 16 * * SAT, Sunday 2am AEST). The sync Worker runs delta updates against the TGA API using tga_sync_cursor. Full re-run if API schema changes.


1b. TGA REST API — Training Component Deep Data

data_sources key: tga_rest_training Swagger: https://training.gov.au/swagger/Training%20-%20v1/swagger.json Ingested: 28–29 March 2026 | DATA-10a, DATA-11b, DATA-11c, DATA-11e, DATA-11f Scripts: scripts/ingest/ingest_tga_training.py, scripts/ingest/ingest_tga_training_taxonomy.py, scripts/ingest/ingest_tga_metadata.py, scripts/ingest/ingest_tga_training_structure.py, scripts/ingest/ingest_tga_release_grids.py

What it is

The Training group of TGA REST API endpoints exposes every training component (qualifications, units, skill sets, accredited courses, training packages) with full release history, taxonomy (occupation and industry sector assignments), classification (ANZSCO, ASCED, qualification level), prerequisite structure, unit grid composition per release, and training package membership history.

Tables populated

Table Rows Content
tga_training_components 84,728 All training components — code, title, type, status, release count, supersession chain
tga_training_releases 99,531 Release history per component — release number, date, currency
tga_training_classifications 76,993+ ANZSCO, ASCED4, ASCED6, qualification level classifications per component
tga_training_taxonomy_occupations 6,954 Occupation taxonomy assignments per component
tga_training_taxonomy_sectors 4,106 Industry sector taxonomy assignments per component
tga_training_prerequisites 1,437 Structured prerequisite unit relationships
tga_unit_grid_usage 280,878 Per unit: which quals/skill sets include it, with core/elective flag
tga_release_unit_grids 224,178 Per qual/skill set release: which units were included at that version
tga_tp_release_components 95,510 Training package release composition — all components per TP per release
tga_api_metadata 1 TGA API version + data sync timestamp (freshness signal for sync Worker)
tga_nrt_classification_schemes 6 NRT classification scheme reference (ANZSCO, ASCED, etc.)
tga_nrt_classification_values 3,569 Classification value lookup table
tga_rto_classification_schemes 1 RTO classification scheme reference
tga_rto_classification_values 16 RTO type lookup table (private/TAFE/enterprise/community/school/university)

Intelligence view built on this data

View Rows Purpose
vw_qual_intelligence 8,007 Qualification intelligence — supersession, release data, ANZSCO, FOE, unit count, RTO delivery count

Update cadence

Weekly via tga-sync Worker. Classification and taxonomy data changes only when training packages are updated by ISCs — infrequent but needs to track.


1c. TGA REST API — Delivery Notification History

data_sources key: tga_rest_delivery_history Endpoint: GET /api/organisation/{code}/deliverynotificationhistory/{trainingcode} Ingested: 28–29 March 2026 | DATA-11d (still running at time of writing — ~56% complete) Script: scripts/ingest/ingest_tga_delivery_history.py

What it is

For each (RTO, training component) pair where scope exists, this endpoint returns the granular notification event history — when the RTO formally notified TGA of adding or removing a qualification from their scope, in which state, on what date. This is more granular than rto_scope_changes (which records scope start/end) — it captures the actual notification event.

Table Rows (at session close) Content
tga_delivery_notification_history 793,000+ (growing) Org × training code × notification date × state × type (Add/Remove)

Update cadence

Weekly via tga-sync Worker — incremental updates only for new notification events.


2. yourcareer.gov.au — Qualification Search API

data_sources key: yourcareer_search
Authority: Jobs and Skills Australia / Department of Employment and Workplace Relations
API URL: https://api.yourcareer.gov.au/api/Courses/search
Auth: None required
Last ingested: 2026-03-21 | Records: 7,054
Ingest script: B-ENRICH-YOURCAREER-01

What it is

The yourcareer.gov.au platform (the successor to myfuture.edu.au) exposes a public unauthenticated API returning financial and employment metadata per qualification — student fees, VSL (VET Student Loans) eligibility, employment outcome percentages, typical duration, and which RTOs deliver each course.

How it was ingested

Paginated GET requests to the search endpoint, pageSize=100, iterated until all qualifications returned. No auth required. Results written as enrichment columns directly onto the qualifications table.

Fields written to qualifications

Column Description
yc_fee_min / yc_fee_max Student fee range ($)
yc_vsl_eligible VET Student Loan eligible flag (1/0)
yc_has_subsidies Government subsidy available flag
yc_apprenticeship_states States where apprenticeship pathway is available
yc_is_apprenticeship Is an apprenticeship pathway flag
yc_offered_online Online delivery available flag
yc_employed_pct Employment % reported after completion
yc_rto_codes Comma-separated list of delivering RTO codes
yc_industry_codes Industry classification codes
yc_occupation_codes ANZSCO occupation codes
yc_duration Typical course duration
yc_has_rto_offerings Whether any RTOs are currently delivering
yc_superseded_by Supersession reference from yourcareer
yc_date_modified Last modification date in yourcareer
yc_enriched_at Timestamp of enrichment write

Derived fields (computed by B-MODEL-01 from this data):

Column Description
market_score 0–100 market viability score per qual (BSB50420 tops at 99)
vsl_value_ratio VSL loan efficiency signal (loan amount vs income uplift)
vsl_total_cost Total VSL loan cost for the qualification

Update cadence and method

Annual. Fee and VSL eligibility data changes with each government budget cycle (typically May/June). Re-run the yourcareer ingest script against the same paginated API. Upsert on qual_code.


3. yourcareer.gov.au — Career Pathways API

data_sources key: yourcareer_pathways
API URL: https://api.yourcareer.gov.au/api/Courses/{code}/PathwaysIndustry
Auth: None required
Last ingested: 2026-03-21 | Records: 54,639

What it is

Per-qualification career pathway data from yourcareer.gov.au — job title progressions and career ladders by industry stream across AQF levels. This powers the Career Ladder View in RTOpacks.

How it was ingested

One API call per qualification code, using each qual_code from the qualifications table as the {code} parameter. Results written to separate tables.

Tables populated

Table Rows Content
qual_career_pathways 54,640 Job title progression per qualification per industry stream — entry, mid, senior roles
qualification_specialisations 967 Specialisation streams available within each qualification

Update cadence and method

Annual, same cycle as yourcareer_search. Re-run the pathways ingest per-qual after yourcareer_search completes. Upsert on qual_code + pathway identifier.


4. CRICOS — Commonwealth Register of Institutions and Courses for Overseas Students

data_sources key: cricos
Authority: Department of Education
Source URL: https://cricos.education.gov.au
Download URL: https://data.gov.au/data/dataset/e5ae7059-bfa8-4fa4-a5c0-c13cf3520193/resource/63fd9610-5bea-438c-bac7-29289d38cfbb/download/cricos-providers-courses-locations.zip
Licence: Creative Commons Attribution 2.5 Australia
Snapshot date: 2 March 2026

What it is

The official register of all Australian institutions and courses approved for international (overseas) student enrolment. Contains CRICOS provider codes, course codes, international tuition fees, course durations, and delivery locations. Used to identify which qualifications attract international students and at what fee benchmarks.

How it is ingested (updated 29 March 2026)

Fully automated via the cricos-sync Cloudflare Worker (DATA-12a, commit 802891c). Worker deployed at workers/cricos-sync/.

Uses four permanent CSV resources on data.gov.au (dataset e5ae7059-bfa8-4fa4-a5c0-c13cf3520193):

Resource Stable Resource ID
CRICOS Locations 45d29535-1360-4486-8242-3850e61b5524
CRICOS Courses 48cacf69-2082-415e-9595-f17d0c3a4af0
CRICOS Institutions 7f6941f3-5327-4db7-b556-5f16d77f63c1
CRICOS Course Locations 4cd2de02-8ba3-4eb2-bac2-fe272cae3f5f

Worker logic: checks last_modified via CKAN API for each resource, compares against cricos_sync_log, downloads and ingests only if data.gov.au has published a new export. Uses INSERT OR REPLACE — idempotent. UNIQUE constraints on natural keys in cricos_locations and cricos_course_locations prevent duplicates on re-run.

Sync tracking: cricos_sync_log table (created DATA-12a, 4 rows from first automated run).

Previous method (retired): Manual ZIP download from data.gov.au. No longer required.

Tables populated

Table Rows Content
cricos_providers 1,542 Providers registered with CRICOS — provider code, name, status
cricos_institutions 1,554 Institution details — ABN, type, trading names
cricos_courses 26,172 Courses approved for international delivery — course code, tuition fee, duration, CRICOS course ID
cricos_locations 3,907 Physical delivery locations per provider
cricos_course_locations 46,482 Course-location junction — which courses are delivered at which locations

Note: cricos_locations and cricos_course_locations now have UNIQUE constraints on their natural keys to prevent duplicates on re-run.

Enrichment written to qualifications: cricos_provider_count and cricos_avg_tuition_aud (number of CRICOS-registered providers and average international tuition fee for each qual code).

Update cadence and method (updated 29 March 2026)

Automated monthly · cricos-sync Worker · cron 0 18 1 * * (1st of each month, 4am AEST). No human action required.

Current row counts (29 March 2026)

Table Rows
cricos_institutions 1,554
cricos_courses 26,172
cricos_locations 3,907
cricos_course_locations 46,482

5. NCVER VOCSTATS — Training Activity Data

data_sources key: ncver_vocstats ⚠️ Not yet registered
Authority: National Centre for Vocational Education Research (NCVER)
URL: https://www.ncver.edu.au/research-and-statistics/vocstats
Auth: Free NCVER registration required
Ingested: 2026-03-20 — manual session

What it is

NCVER's VOCSTATS tool provides access to the Total VET Activity (TVA) database — the national mandatory collection of all VET enrolments and completions reported by RTOs under AVETMISS. Also includes the VET Student Outcomes Survey (SOS) and Apprentices and Trainees collection. These are aggregate tables only — data is by Field of Education (FOE) or AQF level, not by individual qualification. Per-qual volumes come from the JSA API (source #12).

Source database keys: - TVA enrolments/completions: tva_prg_1524_ext_nvetr_rel24 - SOS outcomes: SOS_tva_1625_ext_rel25

How it was ingested

Manual session using the VOCSTATS web query tool at ncver.edu.au. Queries were built with selected dimensions, downloaded as .xlsx + .json pairs. All files were wide format (years as column headers, rows 1–8 are metadata — skip on ingest). Alex ingested using the xlsx npm package. Original files are timestamped by download time.

Tables populated

Table Rows Dimensions Source collection File timestamp
vocstats_enrolments_foe 130 FOE × Year (2015–2024) TVA enrolments table_2026-03-20_18-20-45.xlsx
vocstats_completions_foe 130 FOE × Year (2015–2024) TVA completions table_2026-03-20_18-27-37.xlsx
vocstats_enrolments_apprentice 1,459 Apprentice/trainee × FOE × Year TVA enrolments table_2026-03-20_18-55-58.xlsx
vocstats_enrolments_funding 5,791 Funding source × FOE × Year TVA enrolments table_2026-03-20_19-00-10.xlsx
vocstats_enrolments_international 1,160 International/domestic × 4-digit FOE × Year TVA enrolments table_2026-03-20_18-36-44.xlsx + table_2026-03-20_19-08-22.xlsx
vocstats_enrolments_vis 190 VET in Schools: provider type × school status × training type × year left school × Year TVA enrolments table_2026-03-20_18-51-43.xlsx
vocstats_outcomes 129 Employed/further study × FOE × Year (2016–2025) SOS outcomes table_2026-03-20_19-12-14.xlsx
vocstats_outcomes_aqf 90 Employed/further study × AQF level × Year (2016–2025) SOS outcomes table_2026-03-20_17-44-29.xlsx

Update cadence and method

Annual. NCVER releases TVA data around June each year (covering the previous calendar year). SOS outcomes are also released annually.

Currently manual — Tim re-runs the VOCSTATS query session and downloads new files, Alex re-ingests. No automated path exists yet. Per-qual enrolment/completion volumes (a separate gap) are addressed by Brief #25 via the JSA API (source #12), which is automatable.


6. JSA VNDA — VET National Data Asset (Graduate Outcomes)

data_sources key: jsa_vnda_scrape ⚠️ Not yet registered
Authority: Jobs and Skills Australia, ABS, NCVER (integrated dataset)
Source URL: https://www.jobsandskills.gov.au/data/vocational-education-training/vet-national-data-asset
API URL: https://www.jobsandskills.gov.au/api/v1/opensearch/vnda/_search
Auth: None required
Ingested: 2026-03-19 (Python scraper vnda_scraper.py, then vnda_ingest.py)

What it is

VNDA is an integrated data asset linking NCVER VET activity records with ATO tax data, Department of Social Services income support records, and Department of Education data. It provides the most accurate picture of what happens to VET graduates after completion — employment rates, income uplift, income support exit rates, and further study progression. Currently covers the top ~500 courses by completion volume, with outcomes for the FY2019-20 completion cohort reported in FY2020-21.

Critical limitations: - Covers only the top ~500 courses — 494 of 1,167 current quals (42%) - Data is frozen at FY2020-21 as of March 2026 — no update has been published since - The series_period field must always be displayed alongside any VNDA-derived figure in the UI

How it was ingested

Direct scrape of the JSA VNDA OpenSearch API (vnda/_search index). Python script vnda_scraper.py submitted a single POST query for FY2020-21 with size: 10000, returned all 494 records in one request. Ingested to D1 via vnda_ingest.py. A companion scrape also pulled aggregate views (by AQF level, by FOE) which populate the _aqf and _foe sibling tables.

Tables populated

Table Rows Content Status
recon_vnda 494 Per-qual outcomes + student characteristics JSON CANONICAL — use this
recon_vnda_aqf 7 Outcomes aggregated by AQF level Canonical
recon_vnda_foe 12 Outcomes aggregated by Field of Education Canonical
recon_vnda_state 0 State-level outcomes — empty, not yet populated

Fields per qualification in recon_vnda

Field Description
qual_id Qualification code
employed_pct Employment rate post-completion (e.g. 0.76 = 76%)
employment_change_pct Change in employment rate after completing vs before
median_income Median annual income post-completion ($)
median_income_change Income uplift after completion ($)
income_support_exit_rate % who exited income support after completing
higher_vet_progression % who progressed to a higher-level VET qualification
any_vet_progression % who did any further VET study
student_characteristics JSON: pctFemale, pctDisability, pctFirstNations, pctApprenticeTrainees, pctNoYr12NoCert3, medianCompletionAgeYrs, medianCompletionTimeDays, pctRemote, pctRegional, pctMajorCity
series_period Financial year of data (currently FY2020-21)
vnda_source_code Actual qual code that returned data — may differ from qual_id if supersession walk used (Brief #25)
vnda_match_type Match method: direct, supersedes, superseded_by, sibling, none (Brief #25)

Update cadence and method

When JSA publishes a new VNDA report (no fixed schedule — FY2020-21 has been the only release since 2022). Brief #25 jsa-ingest Worker includes a November cron that detects and ingests new data when it arrives. Brief #25 also adds a supersession walk to push coverage from 42% to ~70%+.


7. JSA VNDA — Export (DEPRECATED)

data_sources key: jsa_vnda_export ⚠️ Not yet registered
Status: DEPRECATED — do not use
Ingested: 2026-03-19 | Rows: 785

Why it is deprecated

A three-way comparison between vnda_atlas, recon_vnda, and the live JSA API for BSB30120 shows vnda_atlas values are incorrect:

Field vnda_atlas recon_vnda JSA API
Employment rate 0.69 ❌ 0.76 ✓ 0.76
Median income $36,300 ❌ $38,700 ✓ $38,700
Income support exit 0.26 ❌ 0.28 ✓ 0.28

The source of vnda_atlas is unknown — likely a Power BI export or an earlier Atlas data download that used different rounding or a different data vintage. The table is retained for audit purposes only and will be flagged as deprecated by Brief #25. Do not reference vnda_atlas in any new code.


8. JSA Occupation Shortage List (OSL)

data_sources key: jsa_osl ⚠️ Not yet registered
Authority: Jobs and Skills Australia
URL: https://www.jobsandskills.gov.au/data/occupation-shortage-list
Auth: None required
Ingested: 2026-03-19 | Rows: 9,454

What it is

Annual assessment of shortage status for occupations across Australia, by state/territory and nationally. Formerly known as the Skills Priority List (SPL). Ratings are based on data modelling, statistical analysis, stakeholder consultation, and employer surveys. Updated annually by JSA.

Shortage ratings:

Code Meaning
S Shortage — employers have considerable difficulty filling vacancies nationally
R Regional shortage only
M Metro shortage only
NS No Shortage — no significant evidence of difficulty filling vacancies

Tables populated

Table Rows Content
osl_ratings 9,454 ANZSCO code × ANZSCO version × state × year — shortage rating, skill level, alt titles, shortage driver

Columns available: anzsco_code, anzsco_ver, occ_title, skill_level, alt_titles, year, rnat (national), rnsw, rvic, rqld, rsa, rwa, rtas, rnt, ract (state ratings), driver

Columns to be added by Brief #25: - shortage_driverlong_training_gap / short_training_gap / suitability_gap / retention_gap / uncertain - is_clean_energy_critical — boolean, 1 for the 38 JSA Clean Energy Capacity Study critical occupations

Update cadence and method

Annual. Brief #25 jsa-ingest Worker — monthly lightweight run checks for OSL updates (cheap to re-fetch even though the data only changes annually).


9. JSA Internet Vacancy Index (IVI)

data_sources key: jsa_ivi ⚠️ Not yet registered
Authority: Jobs and Skills Australia
URL: https://www.jobsandskills.gov.au/data/internet-vacancy-index
Auth: None required
Ingested: 2026-03-19 | Rows: 37,908

What it is

Monthly count of newly lodged online job advertisements by ANZSCO occupation, state, and region. Sourced from SEEK, CareerOne, and Workforce Australia job boards. Released on the third Wednesday of each month. Provides a demand signal for occupations — useful as a leading indicator alongside OSL shortage ratings.

Important caveats: - Counts new ads posted during the reference month — not total open vacancies - Biased toward higher-skilled positions (lower-skilled roles use informal recruitment) - SA4-level data is experimental - IVI data is based on place of work; NERO employment data is based on place of residence — these are not strictly comparable

Tables populated

Table Rows Content
ivi_vacancies 37,908 ANZSCO code × month × state — vacancy ad counts

Update cadence and method

Monthly. Brief #25 jsa-ingest Worker — monthly cron 0 2 1 * *. The IVI download file URL includes the release month in its filename. The Worker must either scrape the IVI download page for the current URL or construct it from the known URL pattern:

https://www.jobsandskills.gov.au/system/files/YYYY-MM/
Internet%20Vacancies%2C%20ANZSCO2%20Occupations%2C%20States%20and%20Territories%20-%20{Month}%20{Year}.xls

10. JSA Employment Projections

data_sources key: jsa_emp_projections ⚠️ Not yet registered
Authority: Jobs and Skills Australia / Victoria University (VUEF model)
URL: https://www.jobsandskills.gov.au/data/employment-projections
Auth: None required
Ingested: 2026-03-19 | Rows: 358

What it is

National employment projections to May 2034 by occupation and industry, produced by Victoria University using the Victoria University Employment Forecasting (VUEF) model, calibrated against Australian Treasury macroeconomic forecasts. Available at 5-year and 10-year horizons. These are forward-looking demand signals — useful for showing whether an occupation is projected to grow or decline.

Tables populated

Table Rows Content
emp_projections 358 ANZSCO code × base year × state — 5-year and 10-year projected employment change (absolute headcount and %)

Update cadence and method

Annual. JSA publishes updated projections yearly. Brief #25 jsa-ingest Worker checks for updates on the monthly run.


11. JSA General Labour Market Data (GLMD) — Regional

data_sources key: jsa_glmd ⚠️ Not yet registered
Authority: Jobs and Skills Australia
File URL pattern: https://www.jobsandskills.gov.au/system/files/datasets/glmd (YYYY-MM).json
Auth: None required
Ingested: 2026-03-19 | Rows: 96

What it is

Monthly static JSON file containing general labour market indicators by SA4 region and state: employment rate, unemployment rate, participation rate, population, youth unemployment, and the Regional Labour Market Indicator (RLMI — a composite performance score rating regions as Strong / Above average / Average / Below average / Poor).

Tables populated

Table Rows Content
glmd_regional 96 Region code × data date — employment rate, unemployment, participation, population, RLMI value and label

Update cadence and method

Monthly. The file URL includes the release month — the Worker fetches the latest file by constructing the URL from the current date. Brief #25 jsa-ingest Worker — monthly cron.


12. JSA Training API — Per-Qualification Enrolments and Completions

data_sources key: jsa_training_api ⚠️ Not yet registered
Authority: Jobs and Skills Australia (sourced from NCVER TVA)
API URL: https://www.jobsandskills.gov.au/api/v1/opensearch/training/_search
Auth: None required
Status: Not yet ingested — Brief #25 adds this

What it is

The JSA Atlas OpenSearch API's training index contains per-qualification enrolment and completion volumes sourced from NCVER TVA — the data that VOCSTATS provides only at FOE-level aggregate. This gives us individual qual-level figures: e.g. BSB30120 Certificate III in Business: 67,112 enrolments in 2024, 22,192 completions. Also includes FOE classification (linking quals to the broader Field of Education taxonomy) and demographic segmentation (gender, Indigenous status, disability status per qual).

Confirmed live values for BSB30120 (2024): - Enrolments: 67,112 (−10.1% YoY) - Completions: 22,192 (−3.1% YoY) - FOE code: 0809 (Office studies) - Gender split: 63% Female, 36.5% Male

Tables to be created by Brief #25

Table Content
jsa_qual_training Per-qual: enrolments, completions, YoY trends, FOE code/name, AQF level — national only initially
jsa_qual_training_segments Demographic breakdown per qual, open-ended key/value design — gender, Indigenous, disability

⚠️ Critical display rule: Enrolments and completions must never be divided to produce a "completion rate" — they represent different student cohorts in different years. JSA explicitly warns against this comparison.

Update cadence and method

Annual — NCVER TVA releases around June each year. Brief #25 jsa-ingest Worker — vet run, cron 0 2 1 7 * (1 July annually). Rate limit: 1 req/sec to JSA API.


13–19. State Government Funding Lists

All state data is stored in the single state_funding table (3,040 rows total). Fields: qual_code, state, funded, free, free_apprenticeship, pathway, program_name, co_contribution, conditions, source_version, ingested_at.

NSW — Smart and Skilled

data_sources key: ssl_nsw
URL: https://www.nsw.gov.au/education-and-training/vocational/nsw-skills-list
Method: Excel download — Skills List v16.2, xlsx parse
Rows: 1,164 | Last ingested: 2026-03-24
Version: NSW Skills List v16.2 @ 01/01/2026
Notes: Three pathways (Traineeship, Apprenticeship, General Training). 149 quals flagged as fee-free (NFF). Largest state dataset by row count.

QLD — Career Start / Queensland Subsidised Training List (QSTL)

data_sources key: qstl_qld
URL: https://dtet.qld.gov.au/training/providers/funded/subsidised-training-list
Method: PDF extract via pdftotext and regex — QSTL v4 Feb 2026
Rows: 459 | Last ingested: 2026-03-24
Notes: Includes max completion payable to RTO — useful for margin intelligence. Available from the QLD Publications portal.

VIC — Free TAFE / Skills First

data_sources key: funded_vic ⚠️ Not yet registered
URL: https://www.vic.gov.au/free-tafe-courses
Rows: 7 ⚠️ SIGNIFICANTLY INCOMPLETE
Last ingested: 2026-03-24
Notes: Only 7 Free TAFE rows ingested. The full Victorian Skills First Training Needs List (Excel download from vic.gov.au) contains hundreds of quals with subsidy rates and should be re-ingested. This is a known gap. The source_version for current rows is VIC Free TAFE 2026.

SA — WorkReady

data_sources key: stl_sa
URL: https://providers.skills.sa.gov.au/subsidised-training-list
Method: PDF parse via pdftotext and regex — STAL + TPL components
Rows: 720 (299 WorkReady + 421 WorkReady Apprenticeship) | Last ingested: 2026-03-24
Version: SA STL v11.2
Notes: Two components — STAL (training contracts) and TPL (general training priority list). PDF parsing may break if SA changes their STL format.

WA — Jobs and Skills WA

data_sources key: funded_wa
URL: https://www.jobsandskills.wa.gov.au/course-list
Method: Puppeteer scrape of Drupal AJAX course list, then title-matched to qualifications table
Rows: 175 | Last ingested: 2026-03-24
Notes: 58 fee-free, 121 low-fee. 182 WA courses could not be matched (mostly skill sets not in our qual table). Jobs and Skills WA program.

TAS — Skills Tasmania

data_sources key: funded_tas
URL: https://www.skills.tas.gov.au/providers/rto/courses_approved_and_funded_in_tasmania
Method: Excel download — two separate files (non-apprenticeship and apprenticeship/traineeship)
Rows: 420 | Last ingested: 2026-03-24
Notes: Excel file URLs include a date in the filename — check the TAS page for current URLs at each refresh. Previous URLs: - Non-app: 19-RTOs-funded-to-deliver-qualifications-as-non-apprenticeship-and-traineeship-as-at-20-March-2026.xlsx - App: 21-RTOs-funded-to-deliver-qualificatinos-as-an-apprenticeship-or-traineeship-as-at-20-March-2026.xlsx

NT — Fee-Free TAFE / User Choice (Charles Darwin University)

data_sources key: funded_nt
URL: https://www.cdu.edu.au/courses?type=vet
Method: CDU course page scrape — qual code regex extraction from rendered HTML
Rows: 95 | Last ingested: 2026-03-24
Notes: NT data sourced from CDU VET catalogue only. CDU is the primary NT TAFE provider but not the only one — Batchelor Institute also delivers. Fee-free status not individually confirmed per qual. Marked as potentially incomplete.

ACT — ACT Skills

data_sources key: Not registered
Status: Not yet ingested
Notes: ACT Skills Fund approved training list is published but has not been ingested. Lower priority given market size. Should be added in the next state funding refresh cycle.

Update cadence and method for all state lists

Annual — most state lists update January–March following budget cycles. NSW and VIC publish mid-year updates. Each state is a different format and ingest method:

State Format Automation potential
NSW Excel (stable URL) High — URL predictable
QLD PDF (URL changes with version) Medium — requires PDF parser
VIC Excel (stable URL) High — URL predictable, but full list not yet wired
SA PDF (URL changes with version) Medium — requires PDF parser
WA Drupal AJAX scrape Low — requires Puppeteer
TAS Excel (URL includes date) Medium — URL pattern predictable
NT HTML scrape (CDU) Low — rendered HTML, may break
ACT Not yet implemented

20. ABR Reference Codes

data_sources key: abr_reference ⚠️ Not yet registered
Authority: Australian Taxation Office / Australian Business Register
Rows: 167

What it is

Reference lookup table for ABR entity type codes (e.g. PRV = Australian Private Company, PUB = Australian Public Company) and industry classification codes used in ABN records. Used to interpret entity type fields on RTO records.

Update cadence

Infrequent — ABR classification codes change rarely. Re-ingest only when ATO publishes updated classification standards.


21. Unit Stats (Derived Aggregates)

data_sources key: unit_stats
Not an external source — computed from qualification_units and rto_scope within the database
Last computed: 2026-03-24 | Records: 15,200

What it is

Pre-computed aggregate statistics written back as columns on the units table, making unit-level queries fast without joins:

Column Description
stat_qual_count How many qualifications include this unit
stat_core_count How many qualifications include this unit as a core unit
stat_elective_count How many qualifications include this unit as an elective
stat_rto_count How many RTOs deliver qualifications that include this unit

Update method

Re-run after every TGA corpus ingest. Uses batch UPDATE via indexed temp tables. Script is part of the TGA adapter ingest pipeline.


Known Gaps and Issues

Issue Impact Priority
VIC Skills First full list not ingested — only 7 rows VIC funding intelligence severely incomplete High
ACT not ingested No ACT state_funding rows Medium
VNDA coverage at 42% (494/1,167 current quals) Employment/income data missing for 58% of quals High — Brief #25 addressing
Per-qual enrolment/completion volumes missing No individual qual market size data High — Brief #25 adding
9 sources not registered in data_sources Provenance gap — no refresh tracking High — Brief #25 Part 1 fixing
vnda_atlas contains incorrect values Risk of bad data reaching UI High — Brief #25 Part 2 deprecating
No automated refresh for any data All current data is one-time seeds from March 2026 High — Brief #25 Part 4 adding cron Worker
NCVER VOCSTATS requires manual download session Cannot be automated without NCVER API access Medium
NT data incomplete (CDU only) Batchelor Institute quals missing Low

Source Provenance: Unregistered Tables

The following tables exist in rtopacks-db with data but no entry in data_sources. Brief #25 Part 1 registers all of these.

Table Source to register source_key
recon_vnda, recon_vnda_aqf, recon_vnda_foe JSA VNDA OpenSearch API scrape jsa_vnda_scrape
vnda_atlas Unknown export — deprecated jsa_vnda_export
osl_ratings JSA Occupation Shortage List jsa_osl
ivi_vacancies JSA Internet Vacancy Index jsa_ivi
emp_projections JSA Employment Projections jsa_emp_projections
glmd_regional JSA GLMD regional JSON jsa_glmd
state_funding (VIC rows) VIC Free TAFE page funded_vic
abr_codes ATO ABR reference abr_reference
All vocstats_* tables NCVER VOCSTATS ncver_vocstats
jsa_qual_training (new — Brief #25) JSA Training OpenSearch API jsa_training_api