RTOpacks Data Sources¶
This is the authoritative reference for every data source that feeds rtopacks-db. It covers what the data is, where it comes from, how it was ingested, which tables it populates, current row counts, and what the update cadence and method should be.
Last audited: 29 March 2026
Database: rtopacks-db (D1: 334ac8fb-9850-48c0-9da0-b56c55640e98)
Cloudflare account: e5a9830215a8d88961dc6c80a8c7442a
Provenance table: data_sources (all sources should be registered here)
Quick Reference¶
| # | Source | Tables | Rows | Registered | Method | Endpoint / URL | Cron | Frequency |
|---|---|---|---|---|---|---|---|---|
| 1 | National Training Register (TGA) — adapter | qualifications, units, qualification_units, rtos, rto_scope + 5 more | ~490,000 | ✅ | REST API | training.gov.au/api |
tga-sync · 0 16 * * SAT |
Weekly |
| 1a | TGA REST API — Organisation deep data | tga_organisations, tga_org_legal_names, tga_org_contacts, tga_org_addresses, tga_org_classifications, tga_org_trading_names, tga_org_registration_managers, tga_org_regulatory_decisions, tga_org_restrictions, tga_org_registration_history | 530,000+ | ✅ | REST API | training.gov.au/api/organisation/{code}/* |
tga-sync · 0 16 * * SAT |
Weekly |
| 1b | TGA REST API — Training component deep data | tga_training_components, tga_training_releases, tga_training_classifications, tga_training_taxonomy_occupations, tga_training_taxonomy_sectors, tga_training_prerequisites, tga_unit_grid_usage, tga_release_unit_grids, tga_tp_release_components, tga_api_metadata, tga_nrt_classification_values, tga_rto_classification_values | 1,000,000+ | ✅ | REST API | training.gov.au/api/training/{code}/* |
tga-sync · 0 16 * * SAT |
Weekly |
| 1c | TGA REST API — Delivery notification history | tga_delivery_notification_history | 793,000+ | ✅ | REST API | training.gov.au/api/organisation/{code}/deliverynotificationhistory/{trainingcode} |
tga-sync · 0 16 * * SAT |
Weekly |
| 2 | yourcareer.gov.au — Search API | qualifications (enrichment columns) | 7,054 quals | ✅ | REST API | api.yourcareer.gov.au/api/Courses/search |
— | Manual · Annual |
| 3 | yourcareer.gov.au — Pathways API | qual_career_pathways, qualification_specialisations | 54,640 + 967 | ✅ | REST API | api.yourcareer.gov.au/api/Courses/{code}/PathwaysIndustry |
— | Manual · Annual |
| 4 | CRICOS | cricos_providers, cricos_courses, cricos_institutions, cricos_locations, cricos_course_locations | 79,117 | ✅ | CSV download | data.gov.au dataset e5ae7059-bfa8-4fa4-a5c0-c13cf3520193 (4 stable resource IDs) |
cricos-sync · 0 18 1 * * |
Monthly · 1st |
| 5 | NCVER VOCSTATS | vocstats_* (8 tables) | ~9,200 | ⚠️ | Manual (web query tool) | ncver.edu.au/vocstats |
— | Manual · Annual |
| 6 | JSA VNDA — Graduate Outcomes | recon_vnda, recon_vnda_aqf, recon_vnda_foe | 513 | ⚠️ | REST API | jobsandskills.gov.au/api/v1/opensearch/vnda/_search |
— | Manual · When JSA updates |
| 7 | JSA VNDA — Export | vnda_atlas | 785 | ⚠️ | — | — | — | DEPRECATED |
| 8 | JSA Occupation Shortage List (OSL) | osl_ratings | 9,454 | ⚠️ | Excel download | jobsandskills.gov.au/data/occupation-shortage-list |
Brief #25 Worker | Annual |
| 9 | JSA Internet Vacancy Index (IVI) | ivi_vacancies | 37,908 | ⚠️ | Excel download | jobsandskills.gov.au/system/files/YYYY-MM/Internet Vacancies...xls |
Brief #25 Worker · 0 2 1 * * |
Monthly |
| 10 | JSA Employment Projections | emp_projections | 358 | ⚠️ | Excel download | jobsandskills.gov.au/data/employment-projections |
Brief #25 Worker | Annual |
| 11 | JSA GLMD — Regional Labour Market | glmd_regional | 96 | ⚠️ | REST API (JSON file) | jobsandskills.gov.au/system/files/datasets/glmd (YYYY-MM).json |
Brief #25 Worker · 0 2 1 * * |
Monthly |
| 12 | JSA Training API — Per-qual enrolments | jsa_qual_training (new) | 0 — Brief #25 | ⚠️ | REST API | jobsandskills.gov.au/api/v1/opensearch/training/_search |
Brief #25 Worker · 0 2 1 7 * |
Annual · July |
| 13 | NSW Smart and Skilled | state_funding (NSW) | 1,164 | ✅ | Excel download | nsw.gov.au/education-and-training/vocational/nsw-skills-list |
— | Manual · Annual |
| 14 | QLD Career Start / QSTL | state_funding (QLD) | 459 | ✅ | PDF parse | dtet.qld.gov.au/training/providers/funded/subsidised-training-list |
— | Manual · Annual |
| 15 | VIC Free TAFE / Skills First | state_funding (VIC) | 7 ⚠️ incomplete | ⚠️ | Excel download | vic.gov.au/free-tafe-courses |
— | Manual · Annual |
| 16 | SA WorkReady | state_funding (SA) | 720 | ✅ | PDF parse | providers.skills.sa.gov.au/subsidised-training-list |
— | Manual · Annual |
| 17 | WA Jobs and Skills WA | state_funding (WA) | 175 | ✅ | HTML scrape (Puppeteer) | jobsandskills.wa.gov.au/course-list |
— | Manual · Annual |
| 18 | TAS Skills Tasmania | state_funding (TAS) | 420 | ✅ | Excel download | skills.tas.gov.au/providers/rto/courses_approved_and_funded_in_tasmania |
— | Manual · Annual |
| 19 | NT Fee-Free TAFE / CDU | state_funding (NT) | 95 | ✅ | HTML scrape | cdu.edu.au/courses?type=vet |
— | Manual · Annual |
| 20 | ACT Skills | state_funding (ACT) | 0 — not yet ingested | ❌ | — | — | — | — |
| 21 | ABR Reference Codes | abr_codes | 167 | ⚠️ | One-off | ABR | — | Rarely |
| 22 | Unit Stats (derived) | units (computed columns) | 15,200 | ✅ | Derived | Computed from DB | — | After TGA ingest |
Method legend:
- REST API — direct programmatic API call, JSON response
- CSV download — direct CSV file download from stable URL
- Excel download — .xlsx file download, requires xlsx parser
- PDF parse — PDF text extraction via pdftotext + regex
- HTML scrape — rendered HTML scrape (Puppeteer or requests)
- Manual — requires human to log in or navigate a web tool
- Derived — computed from data already in rtopacks-db
- DEPRECATED — do not use
Cron schedule notation (AEST):
- 0 16 * * SAT → Sunday 2am AEST (weekly)
- 0 18 1 * * → 1st of month, 4am AEST (monthly)
- 0 2 1 * * → 1st of month, 12pm AEST (monthly)
- 0 2 1 7 * → 1 July annually, 12pm AEST
Sources marked ⚠️ will be registered in data_sources as part of Brief #25.
1. National Training Register (TGA)¶
data_sources key: tga_corpus
Authority: Department of Employment and Workplace Relations
URL: https://training.gov.au
Licence: Creative Commons Attribution 4.0
Auth: None required
Last ingested: Not recorded in data_sources — TGA ingest predates the provenance registry
What it is¶
The authoritative Australian register of all vocational education and training. Contains every qualification, unit of competency, skill set, and accredited course across all statuses (current, superseded, expired), plus every registered RTO and their approved delivery scope. This is the foundational dataset — everything else in the database enriches on top of it.
How it was ingested¶
TGA adapter ingest via the UCCA engine's adapters/tga/ module. The engine processes TGA's public data endpoints, applies the TGA adapter, and populates the corpus tables in rtopacks-db. This is a world-level ingest — the AU-VET world's foundational data.
Tables populated¶
| Table | Rows | Content |
|---|---|---|
qualifications |
8,007 | All qualifications — code, title, AQF level, training package, packaging rules, supersession chain (supersedes, superseded_by), status, entry requirements, description |
units |
75,189 | All units of competency — code, title, description, elements, performance criteria, knowledge evidence, performance evidence, assessment conditions, foundation skills, training package, status |
qualification_units |
244,874 | Junction table — which units belong to which qualifications, core vs elective flags |
rtos |
12,515 | All registered training organisations — code, legal name, ABN, status, type |
rto_scope |
150,000 | RTO delivery scope — which RTOs are approved to deliver which qualifications, in which states |
rto_addresses |
1,064 | RTO physical and postal addresses |
rto_contacts |
2,176 | RTO contact details (email, phone, fax) |
rto_legal_names |
139 | RTO legal name history |
rto_trading_names |
137 | RTO trading names |
rto_registrations |
267 | RTO registration periods, regulators (ASQA / state) |
rto_web_addresses |
~137 | RTO website URLs |
rto_classifications |
~267 | RTO type classifications |
Update cadence and method¶
TGA updates continuously as training packages are endorsed, superseded, and as RTOs register/deregister. Re-run the TGA adapter ingest quarterly, or when a major training package update is announced by a relevant Industry Reference Committee (IRC). After each TGA ingest, also re-run the unit_stats aggregate (source #22).
1a. TGA REST API — Organisation Deep Data¶
data_sources key: tga_rest_organisations
Authority: Department of Employment and Workplace Relations
API Base URL: https://training.gov.au/api
Swagger: https://training.gov.au/swagger/Organisation%20-%20v1/swagger.json
Auth: None required — unauthenticated public REST API (undocumented, backs the training.gov.au SPA)
Ingested: 28–29 March 2026 | DATA-11a, DATA-10b
Scripts: scripts/ingest/ingest_tga_organisations.py, scripts/ingest/ingest_tga_org_deep.py
What it is¶
The TGA REST API is an undocumented, unauthenticated REST API discovered by reverse-engineering the training.gov.au SPA in Session 12 (28 March 2026). It exposes the full NTR dataset via clean RESTful endpoints with no auth, no rate limiting, and no CORS restriction. The Organisation group of endpoints exposes historical and structured data about every registered and previously registered training organisation — far richer than what the TGA adapter ingest (#1) provides.
Key discovery: The API has been running silently behind the SPA, never publicly documented or indexed. Hosted on DEWR's own AS9509 infrastructure in Sydney. Zero rate limiting observed across 500,000+ API calls during the Session 12 overnight run. The full Swagger spec is available at https://training.gov.au/swagger/Organisation%20-%20v1/swagger.json.
Ingest strategy¶
Two-pass ingest: - DATA-10b — base organisation snapshot per org code: registration status, current address, lat/lng, CEO contacts, regulatory decisions, restrictions, registration history - DATA-11a — deep historical pass per org code: legal name history (with ABN arrays), trading name history, address history (all types), full contacts, classification (RTO type), CRICOS code history, registration manager history, scope summary counts
Tables populated¶
| Table | Rows | Content |
|---|---|---|
tga_organisations |
12,500 | Base snapshot — registration status, address, lat/lng, CEO, is_restricted, is_cricos, delivery_states |
tga_org_registration_history |
28,830 | Registration period history — start, end, expiry, renewal decision |
tga_org_regulatory_decisions |
1,415 | ASQA enforcement decisions — decision type, level, legislation, effective/expiry dates |
tga_org_restrictions |
804 | Active and historical scope/registration restrictions |
tga_org_legal_names |
17,316 | Legal name history with ABN array and ACN per name period — phoenix pattern key |
tga_org_trading_names |
13,239 | Trading name history with date ranges |
tga_org_addresses |
63,159 | Full address history typed by role (postal/principal/headOffice/deliveryLocation) |
tga_org_contacts |
193,121 | Full contact records — CEO, public enquiries, registration enquiries, all roles |
tga_org_classifications |
13,093 | RTO type classifications (private/TAFE/enterprise/community/school/university) |
tga_org_registration_managers |
16,850 | ASQA vs state regulator history with legal authority and exerciser |
tga_org_cricos_codes |
0 | CRICOS code history — endpoint returned empty for all orgs (TGA does not hold CRICOS code history; use cricos_* tables from source #4) |
Intelligence views built on this data¶
| View | Rows | Purpose |
|---|---|---|
vw_rto_intelligence |
12,515 | Flagship RTO view — combines registration, scope, regulatory history, restriction status |
vw_rto_risk_profile |
937 | RTOs with regulatory decisions or active restrictions — the moat view |
Strategic significance¶
- 193,121 contact records — first time CEO/contact data is structured and queryable across all 12,500 orgs
- 17,316 legal name records with ABN arrays — enables CEO phoenix pattern analysis: directors who closed one RTO and opened another under different names
- 937 RTOs with regulatory history — ASQA enforcement actions structured outside government for the first time
- 420 currently active RTOs have regulatory decisions against them — queryable, filterable, commercially differentiating
Update cadence¶
Weekly via tga-sync Worker (cron 0 16 * * SAT, Sunday 2am AEST). The sync Worker runs delta updates against the TGA API using tga_sync_cursor. Full re-run if API schema changes.
1b. TGA REST API — Training Component Deep Data¶
data_sources key: tga_rest_training
Swagger: https://training.gov.au/swagger/Training%20-%20v1/swagger.json
Ingested: 28–29 March 2026 | DATA-10a, DATA-11b, DATA-11c, DATA-11e, DATA-11f
Scripts: scripts/ingest/ingest_tga_training.py, scripts/ingest/ingest_tga_training_taxonomy.py, scripts/ingest/ingest_tga_metadata.py, scripts/ingest/ingest_tga_training_structure.py, scripts/ingest/ingest_tga_release_grids.py
What it is¶
The Training group of TGA REST API endpoints exposes every training component (qualifications, units, skill sets, accredited courses, training packages) with full release history, taxonomy (occupation and industry sector assignments), classification (ANZSCO, ASCED, qualification level), prerequisite structure, unit grid composition per release, and training package membership history.
Tables populated¶
| Table | Rows | Content |
|---|---|---|
tga_training_components |
84,728 | All training components — code, title, type, status, release count, supersession chain |
tga_training_releases |
99,531 | Release history per component — release number, date, currency |
tga_training_classifications |
76,993+ | ANZSCO, ASCED4, ASCED6, qualification level classifications per component |
tga_training_taxonomy_occupations |
6,954 | Occupation taxonomy assignments per component |
tga_training_taxonomy_sectors |
4,106 | Industry sector taxonomy assignments per component |
tga_training_prerequisites |
1,437 | Structured prerequisite unit relationships |
tga_unit_grid_usage |
280,878 | Per unit: which quals/skill sets include it, with core/elective flag |
tga_release_unit_grids |
224,178 | Per qual/skill set release: which units were included at that version |
tga_tp_release_components |
95,510 | Training package release composition — all components per TP per release |
tga_api_metadata |
1 | TGA API version + data sync timestamp (freshness signal for sync Worker) |
tga_nrt_classification_schemes |
6 | NRT classification scheme reference (ANZSCO, ASCED, etc.) |
tga_nrt_classification_values |
3,569 | Classification value lookup table |
tga_rto_classification_schemes |
1 | RTO classification scheme reference |
tga_rto_classification_values |
16 | RTO type lookup table (private/TAFE/enterprise/community/school/university) |
Intelligence view built on this data¶
| View | Rows | Purpose |
|---|---|---|
vw_qual_intelligence |
8,007 | Qualification intelligence — supersession, release data, ANZSCO, FOE, unit count, RTO delivery count |
Update cadence¶
Weekly via tga-sync Worker. Classification and taxonomy data changes only when training packages are updated by ISCs — infrequent but needs to track.
1c. TGA REST API — Delivery Notification History¶
data_sources key: tga_rest_delivery_history
Endpoint: GET /api/organisation/{code}/deliverynotificationhistory/{trainingcode}
Ingested: 28–29 March 2026 | DATA-11d (still running at time of writing — ~56% complete)
Script: scripts/ingest/ingest_tga_delivery_history.py
What it is¶
For each (RTO, training component) pair where scope exists, this endpoint returns the granular notification event history — when the RTO formally notified TGA of adding or removing a qualification from their scope, in which state, on what date. This is more granular than rto_scope_changes (which records scope start/end) — it captures the actual notification event.
| Table | Rows (at session close) | Content |
|---|---|---|
tga_delivery_notification_history |
793,000+ (growing) | Org × training code × notification date × state × type (Add/Remove) |
Update cadence¶
Weekly via tga-sync Worker — incremental updates only for new notification events.
2. yourcareer.gov.au — Qualification Search API¶
data_sources key: yourcareer_search
Authority: Jobs and Skills Australia / Department of Employment and Workplace Relations
API URL: https://api.yourcareer.gov.au/api/Courses/search
Auth: None required
Last ingested: 2026-03-21 | Records: 7,054
Ingest script: B-ENRICH-YOURCAREER-01
What it is¶
The yourcareer.gov.au platform (the successor to myfuture.edu.au) exposes a public unauthenticated API returning financial and employment metadata per qualification — student fees, VSL (VET Student Loans) eligibility, employment outcome percentages, typical duration, and which RTOs deliver each course.
How it was ingested¶
Paginated GET requests to the search endpoint, pageSize=100, iterated until all qualifications returned. No auth required. Results written as enrichment columns directly onto the qualifications table.
Fields written to qualifications¶
| Column | Description |
|---|---|
yc_fee_min / yc_fee_max |
Student fee range ($) |
yc_vsl_eligible |
VET Student Loan eligible flag (1/0) |
yc_has_subsidies |
Government subsidy available flag |
yc_apprenticeship_states |
States where apprenticeship pathway is available |
yc_is_apprenticeship |
Is an apprenticeship pathway flag |
yc_offered_online |
Online delivery available flag |
yc_employed_pct |
Employment % reported after completion |
yc_rto_codes |
Comma-separated list of delivering RTO codes |
yc_industry_codes |
Industry classification codes |
yc_occupation_codes |
ANZSCO occupation codes |
yc_duration |
Typical course duration |
yc_has_rto_offerings |
Whether any RTOs are currently delivering |
yc_superseded_by |
Supersession reference from yourcareer |
yc_date_modified |
Last modification date in yourcareer |
yc_enriched_at |
Timestamp of enrichment write |
Derived fields (computed by B-MODEL-01 from this data):
| Column | Description |
|---|---|
market_score |
0–100 market viability score per qual (BSB50420 tops at 99) |
vsl_value_ratio |
VSL loan efficiency signal (loan amount vs income uplift) |
vsl_total_cost |
Total VSL loan cost for the qualification |
Update cadence and method¶
Annual. Fee and VSL eligibility data changes with each government budget cycle (typically May/June). Re-run the yourcareer ingest script against the same paginated API. Upsert on qual_code.
3. yourcareer.gov.au — Career Pathways API¶
data_sources key: yourcareer_pathways
API URL: https://api.yourcareer.gov.au/api/Courses/{code}/PathwaysIndustry
Auth: None required
Last ingested: 2026-03-21 | Records: 54,639
What it is¶
Per-qualification career pathway data from yourcareer.gov.au — job title progressions and career ladders by industry stream across AQF levels. This powers the Career Ladder View in RTOpacks.
How it was ingested¶
One API call per qualification code, using each qual_code from the qualifications table as the {code} parameter. Results written to separate tables.
Tables populated¶
| Table | Rows | Content |
|---|---|---|
qual_career_pathways |
54,640 | Job title progression per qualification per industry stream — entry, mid, senior roles |
qualification_specialisations |
967 | Specialisation streams available within each qualification |
Update cadence and method¶
Annual, same cycle as yourcareer_search. Re-run the pathways ingest per-qual after yourcareer_search completes. Upsert on qual_code + pathway identifier.
4. CRICOS — Commonwealth Register of Institutions and Courses for Overseas Students¶
data_sources key: cricos
Authority: Department of Education
Source URL: https://cricos.education.gov.au
Download URL: https://data.gov.au/data/dataset/e5ae7059-bfa8-4fa4-a5c0-c13cf3520193/resource/63fd9610-5bea-438c-bac7-29289d38cfbb/download/cricos-providers-courses-locations.zip
Licence: Creative Commons Attribution 2.5 Australia
Snapshot date: 2 March 2026
What it is¶
The official register of all Australian institutions and courses approved for international (overseas) student enrolment. Contains CRICOS provider codes, course codes, international tuition fees, course durations, and delivery locations. Used to identify which qualifications attract international students and at what fee benchmarks.
How it is ingested (updated 29 March 2026)¶
Fully automated via the cricos-sync Cloudflare Worker (DATA-12a, commit 802891c).
Worker deployed at workers/cricos-sync/.
Uses four permanent CSV resources on data.gov.au (dataset e5ae7059-bfa8-4fa4-a5c0-c13cf3520193):
| Resource | Stable Resource ID |
|---|---|
| CRICOS Locations | 45d29535-1360-4486-8242-3850e61b5524 |
| CRICOS Courses | 48cacf69-2082-415e-9595-f17d0c3a4af0 |
| CRICOS Institutions | 7f6941f3-5327-4db7-b556-5f16d77f63c1 |
| CRICOS Course Locations | 4cd2de02-8ba3-4eb2-bac2-fe272cae3f5f |
Worker logic: checks last_modified via CKAN API for each resource, compares against cricos_sync_log, downloads and ingests only if data.gov.au has published a new export. Uses INSERT OR REPLACE — idempotent. UNIQUE constraints on natural keys in cricos_locations and cricos_course_locations prevent duplicates on re-run.
Sync tracking: cricos_sync_log table (created DATA-12a, 4 rows from first automated run).
Previous method (retired): Manual ZIP download from data.gov.au. No longer required.
Tables populated¶
| Table | Rows | Content |
|---|---|---|
cricos_providers |
1,542 | Providers registered with CRICOS — provider code, name, status |
cricos_institutions |
1,554 | Institution details — ABN, type, trading names |
cricos_courses |
26,172 | Courses approved for international delivery — course code, tuition fee, duration, CRICOS course ID |
cricos_locations |
3,907 | Physical delivery locations per provider |
cricos_course_locations |
46,482 | Course-location junction — which courses are delivered at which locations |
Note: cricos_locations and cricos_course_locations now have UNIQUE constraints on their natural keys to prevent duplicates on re-run.
Enrichment written to qualifications: cricos_provider_count and cricos_avg_tuition_aud (number of CRICOS-registered providers and average international tuition fee for each qual code).
Update cadence and method (updated 29 March 2026)¶
Automated monthly · cricos-sync Worker · cron 0 18 1 * * (1st of each month, 4am AEST). No human action required.
Current row counts (29 March 2026)¶
| Table | Rows |
|---|---|
cricos_institutions |
1,554 |
cricos_courses |
26,172 |
cricos_locations |
3,907 |
cricos_course_locations |
46,482 |
5. NCVER VOCSTATS — Training Activity Data¶
data_sources key: ncver_vocstats ⚠️ Not yet registered
Authority: National Centre for Vocational Education Research (NCVER)
URL: https://www.ncver.edu.au/research-and-statistics/vocstats
Auth: Free NCVER registration required
Ingested: 2026-03-20 — manual session
What it is¶
NCVER's VOCSTATS tool provides access to the Total VET Activity (TVA) database — the national mandatory collection of all VET enrolments and completions reported by RTOs under AVETMISS. Also includes the VET Student Outcomes Survey (SOS) and Apprentices and Trainees collection. These are aggregate tables only — data is by Field of Education (FOE) or AQF level, not by individual qualification. Per-qual volumes come from the JSA API (source #12).
Source database keys:
- TVA enrolments/completions: tva_prg_1524_ext_nvetr_rel24
- SOS outcomes: SOS_tva_1625_ext_rel25
How it was ingested¶
Manual session using the VOCSTATS web query tool at ncver.edu.au. Queries were built with selected dimensions, downloaded as .xlsx + .json pairs. All files were wide format (years as column headers, rows 1–8 are metadata — skip on ingest). Alex ingested using the xlsx npm package. Original files are timestamped by download time.
Tables populated¶
| Table | Rows | Dimensions | Source collection | File timestamp |
|---|---|---|---|---|
vocstats_enrolments_foe |
130 | FOE × Year (2015–2024) | TVA enrolments | table_2026-03-20_18-20-45.xlsx |
vocstats_completions_foe |
130 | FOE × Year (2015–2024) | TVA completions | table_2026-03-20_18-27-37.xlsx |
vocstats_enrolments_apprentice |
1,459 | Apprentice/trainee × FOE × Year | TVA enrolments | table_2026-03-20_18-55-58.xlsx |
vocstats_enrolments_funding |
5,791 | Funding source × FOE × Year | TVA enrolments | table_2026-03-20_19-00-10.xlsx |
vocstats_enrolments_international |
1,160 | International/domestic × 4-digit FOE × Year | TVA enrolments | table_2026-03-20_18-36-44.xlsx + table_2026-03-20_19-08-22.xlsx |
vocstats_enrolments_vis |
190 | VET in Schools: provider type × school status × training type × year left school × Year | TVA enrolments | table_2026-03-20_18-51-43.xlsx |
vocstats_outcomes |
129 | Employed/further study × FOE × Year (2016–2025) | SOS outcomes | table_2026-03-20_19-12-14.xlsx |
vocstats_outcomes_aqf |
90 | Employed/further study × AQF level × Year (2016–2025) | SOS outcomes | table_2026-03-20_17-44-29.xlsx |
Update cadence and method¶
Annual. NCVER releases TVA data around June each year (covering the previous calendar year). SOS outcomes are also released annually.
Currently manual — Tim re-runs the VOCSTATS query session and downloads new files, Alex re-ingests. No automated path exists yet. Per-qual enrolment/completion volumes (a separate gap) are addressed by Brief #25 via the JSA API (source #12), which is automatable.
6. JSA VNDA — VET National Data Asset (Graduate Outcomes)¶
data_sources key: jsa_vnda_scrape ⚠️ Not yet registered
Authority: Jobs and Skills Australia, ABS, NCVER (integrated dataset)
Source URL: https://www.jobsandskills.gov.au/data/vocational-education-training/vet-national-data-asset
API URL: https://www.jobsandskills.gov.au/api/v1/opensearch/vnda/_search
Auth: None required
Ingested: 2026-03-19 (Python scraper vnda_scraper.py, then vnda_ingest.py)
What it is¶
VNDA is an integrated data asset linking NCVER VET activity records with ATO tax data, Department of Social Services income support records, and Department of Education data. It provides the most accurate picture of what happens to VET graduates after completion — employment rates, income uplift, income support exit rates, and further study progression. Currently covers the top ~500 courses by completion volume, with outcomes for the FY2019-20 completion cohort reported in FY2020-21.
Critical limitations:
- Covers only the top ~500 courses — 494 of 1,167 current quals (42%)
- Data is frozen at FY2020-21 as of March 2026 — no update has been published since
- The series_period field must always be displayed alongside any VNDA-derived figure in the UI
How it was ingested¶
Direct scrape of the JSA VNDA OpenSearch API (vnda/_search index). Python script vnda_scraper.py submitted a single POST query for FY2020-21 with size: 10000, returned all 494 records in one request. Ingested to D1 via vnda_ingest.py. A companion scrape also pulled aggregate views (by AQF level, by FOE) which populate the _aqf and _foe sibling tables.
Tables populated¶
| Table | Rows | Content | Status |
|---|---|---|---|
recon_vnda |
494 | Per-qual outcomes + student characteristics JSON | CANONICAL — use this |
recon_vnda_aqf |
7 | Outcomes aggregated by AQF level | Canonical |
recon_vnda_foe |
12 | Outcomes aggregated by Field of Education | Canonical |
recon_vnda_state |
0 | State-level outcomes — empty, not yet populated | — |
Fields per qualification in recon_vnda¶
| Field | Description |
|---|---|
qual_id |
Qualification code |
employed_pct |
Employment rate post-completion (e.g. 0.76 = 76%) |
employment_change_pct |
Change in employment rate after completing vs before |
median_income |
Median annual income post-completion ($) |
median_income_change |
Income uplift after completion ($) |
income_support_exit_rate |
% who exited income support after completing |
higher_vet_progression |
% who progressed to a higher-level VET qualification |
any_vet_progression |
% who did any further VET study |
student_characteristics |
JSON: pctFemale, pctDisability, pctFirstNations, pctApprenticeTrainees, pctNoYr12NoCert3, medianCompletionAgeYrs, medianCompletionTimeDays, pctRemote, pctRegional, pctMajorCity |
series_period |
Financial year of data (currently FY2020-21) |
vnda_source_code |
Actual qual code that returned data — may differ from qual_id if supersession walk used (Brief #25) |
vnda_match_type |
Match method: direct, supersedes, superseded_by, sibling, none (Brief #25) |
Update cadence and method¶
When JSA publishes a new VNDA report (no fixed schedule — FY2020-21 has been the only release since 2022). Brief #25 jsa-ingest Worker includes a November cron that detects and ingests new data when it arrives. Brief #25 also adds a supersession walk to push coverage from 42% to ~70%+.
7. JSA VNDA — Export (DEPRECATED)¶
data_sources key: jsa_vnda_export ⚠️ Not yet registered
Status: DEPRECATED — do not use
Ingested: 2026-03-19 | Rows: 785
Why it is deprecated¶
A three-way comparison between vnda_atlas, recon_vnda, and the live JSA API for BSB30120 shows vnda_atlas values are incorrect:
| Field | vnda_atlas |
recon_vnda |
JSA API |
|---|---|---|---|
| Employment rate | 0.69 ❌ | 0.76 ✓ | 0.76 |
| Median income | $36,300 ❌ | $38,700 ✓ | $38,700 |
| Income support exit | 0.26 ❌ | 0.28 ✓ | 0.28 |
The source of vnda_atlas is unknown — likely a Power BI export or an earlier Atlas data download that used different rounding or a different data vintage. The table is retained for audit purposes only and will be flagged as deprecated by Brief #25. Do not reference vnda_atlas in any new code.
8. JSA Occupation Shortage List (OSL)¶
data_sources key: jsa_osl ⚠️ Not yet registered
Authority: Jobs and Skills Australia
URL: https://www.jobsandskills.gov.au/data/occupation-shortage-list
Auth: None required
Ingested: 2026-03-19 | Rows: 9,454
What it is¶
Annual assessment of shortage status for occupations across Australia, by state/territory and nationally. Formerly known as the Skills Priority List (SPL). Ratings are based on data modelling, statistical analysis, stakeholder consultation, and employer surveys. Updated annually by JSA.
Shortage ratings:
| Code | Meaning |
|---|---|
S |
Shortage — employers have considerable difficulty filling vacancies nationally |
R |
Regional shortage only |
M |
Metro shortage only |
NS |
No Shortage — no significant evidence of difficulty filling vacancies |
Tables populated¶
| Table | Rows | Content |
|---|---|---|
osl_ratings |
9,454 | ANZSCO code × ANZSCO version × state × year — shortage rating, skill level, alt titles, shortage driver |
Columns available: anzsco_code, anzsco_ver, occ_title, skill_level, alt_titles, year, rnat (national), rnsw, rvic, rqld, rsa, rwa, rtas, rnt, ract (state ratings), driver
Columns to be added by Brief #25:
- shortage_driver — long_training_gap / short_training_gap / suitability_gap / retention_gap / uncertain
- is_clean_energy_critical — boolean, 1 for the 38 JSA Clean Energy Capacity Study critical occupations
Update cadence and method¶
Annual. Brief #25 jsa-ingest Worker — monthly lightweight run checks for OSL updates (cheap to re-fetch even though the data only changes annually).
9. JSA Internet Vacancy Index (IVI)¶
data_sources key: jsa_ivi ⚠️ Not yet registered
Authority: Jobs and Skills Australia
URL: https://www.jobsandskills.gov.au/data/internet-vacancy-index
Auth: None required
Ingested: 2026-03-19 | Rows: 37,908
What it is¶
Monthly count of newly lodged online job advertisements by ANZSCO occupation, state, and region. Sourced from SEEK, CareerOne, and Workforce Australia job boards. Released on the third Wednesday of each month. Provides a demand signal for occupations — useful as a leading indicator alongside OSL shortage ratings.
Important caveats: - Counts new ads posted during the reference month — not total open vacancies - Biased toward higher-skilled positions (lower-skilled roles use informal recruitment) - SA4-level data is experimental - IVI data is based on place of work; NERO employment data is based on place of residence — these are not strictly comparable
Tables populated¶
| Table | Rows | Content |
|---|---|---|
ivi_vacancies |
37,908 | ANZSCO code × month × state — vacancy ad counts |
Update cadence and method¶
Monthly. Brief #25 jsa-ingest Worker — monthly cron 0 2 1 * *. The IVI download file URL includes the release month in its filename. The Worker must either scrape the IVI download page for the current URL or construct it from the known URL pattern:
https://www.jobsandskills.gov.au/system/files/YYYY-MM/
Internet%20Vacancies%2C%20ANZSCO2%20Occupations%2C%20States%20and%20Territories%20-%20{Month}%20{Year}.xls
10. JSA Employment Projections¶
data_sources key: jsa_emp_projections ⚠️ Not yet registered
Authority: Jobs and Skills Australia / Victoria University (VUEF model)
URL: https://www.jobsandskills.gov.au/data/employment-projections
Auth: None required
Ingested: 2026-03-19 | Rows: 358
What it is¶
National employment projections to May 2034 by occupation and industry, produced by Victoria University using the Victoria University Employment Forecasting (VUEF) model, calibrated against Australian Treasury macroeconomic forecasts. Available at 5-year and 10-year horizons. These are forward-looking demand signals — useful for showing whether an occupation is projected to grow or decline.
Tables populated¶
| Table | Rows | Content |
|---|---|---|
emp_projections |
358 | ANZSCO code × base year × state — 5-year and 10-year projected employment change (absolute headcount and %) |
Update cadence and method¶
Annual. JSA publishes updated projections yearly. Brief #25 jsa-ingest Worker checks for updates on the monthly run.
11. JSA General Labour Market Data (GLMD) — Regional¶
data_sources key: jsa_glmd ⚠️ Not yet registered
Authority: Jobs and Skills Australia
File URL pattern: https://www.jobsandskills.gov.au/system/files/datasets/glmd (YYYY-MM).json
Auth: None required
Ingested: 2026-03-19 | Rows: 96
What it is¶
Monthly static JSON file containing general labour market indicators by SA4 region and state: employment rate, unemployment rate, participation rate, population, youth unemployment, and the Regional Labour Market Indicator (RLMI — a composite performance score rating regions as Strong / Above average / Average / Below average / Poor).
Tables populated¶
| Table | Rows | Content |
|---|---|---|
glmd_regional |
96 | Region code × data date — employment rate, unemployment, participation, population, RLMI value and label |
Update cadence and method¶
Monthly. The file URL includes the release month — the Worker fetches the latest file by constructing the URL from the current date. Brief #25 jsa-ingest Worker — monthly cron.
12. JSA Training API — Per-Qualification Enrolments and Completions¶
data_sources key: jsa_training_api ⚠️ Not yet registered
Authority: Jobs and Skills Australia (sourced from NCVER TVA)
API URL: https://www.jobsandskills.gov.au/api/v1/opensearch/training/_search
Auth: None required
Status: Not yet ingested — Brief #25 adds this
What it is¶
The JSA Atlas OpenSearch API's training index contains per-qualification enrolment and completion volumes sourced from NCVER TVA — the data that VOCSTATS provides only at FOE-level aggregate. This gives us individual qual-level figures: e.g. BSB30120 Certificate III in Business: 67,112 enrolments in 2024, 22,192 completions. Also includes FOE classification (linking quals to the broader Field of Education taxonomy) and demographic segmentation (gender, Indigenous status, disability status per qual).
Confirmed live values for BSB30120 (2024): - Enrolments: 67,112 (−10.1% YoY) - Completions: 22,192 (−3.1% YoY) - FOE code: 0809 (Office studies) - Gender split: 63% Female, 36.5% Male
Tables to be created by Brief #25¶
| Table | Content |
|---|---|
jsa_qual_training |
Per-qual: enrolments, completions, YoY trends, FOE code/name, AQF level — national only initially |
jsa_qual_training_segments |
Demographic breakdown per qual, open-ended key/value design — gender, Indigenous, disability |
⚠️ Critical display rule: Enrolments and completions must never be divided to produce a "completion rate" — they represent different student cohorts in different years. JSA explicitly warns against this comparison.
Update cadence and method¶
Annual — NCVER TVA releases around June each year. Brief #25 jsa-ingest Worker — vet run, cron 0 2 1 7 * (1 July annually). Rate limit: 1 req/sec to JSA API.
13–19. State Government Funding Lists¶
All state data is stored in the single state_funding table (3,040 rows total). Fields: qual_code, state, funded, free, free_apprenticeship, pathway, program_name, co_contribution, conditions, source_version, ingested_at.
NSW — Smart and Skilled¶
data_sources key: ssl_nsw ✅
URL: https://www.nsw.gov.au/education-and-training/vocational/nsw-skills-list
Method: Excel download — Skills List v16.2, xlsx parse
Rows: 1,164 | Last ingested: 2026-03-24
Version: NSW Skills List v16.2 @ 01/01/2026
Notes: Three pathways (Traineeship, Apprenticeship, General Training). 149 quals flagged as fee-free (NFF). Largest state dataset by row count.
QLD — Career Start / Queensland Subsidised Training List (QSTL)¶
data_sources key: qstl_qld ✅
URL: https://dtet.qld.gov.au/training/providers/funded/subsidised-training-list
Method: PDF extract via pdftotext and regex — QSTL v4 Feb 2026
Rows: 459 | Last ingested: 2026-03-24
Notes: Includes max completion payable to RTO — useful for margin intelligence. Available from the QLD Publications portal.
VIC — Free TAFE / Skills First¶
data_sources key: funded_vic ⚠️ Not yet registered
URL: https://www.vic.gov.au/free-tafe-courses
Rows: 7 ⚠️ SIGNIFICANTLY INCOMPLETE
Last ingested: 2026-03-24
Notes: Only 7 Free TAFE rows ingested. The full Victorian Skills First Training Needs List (Excel download from vic.gov.au) contains hundreds of quals with subsidy rates and should be re-ingested. This is a known gap. The source_version for current rows is VIC Free TAFE 2026.
SA — WorkReady¶
data_sources key: stl_sa ✅
URL: https://providers.skills.sa.gov.au/subsidised-training-list
Method: PDF parse via pdftotext and regex — STAL + TPL components
Rows: 720 (299 WorkReady + 421 WorkReady Apprenticeship) | Last ingested: 2026-03-24
Version: SA STL v11.2
Notes: Two components — STAL (training contracts) and TPL (general training priority list). PDF parsing may break if SA changes their STL format.
WA — Jobs and Skills WA¶
data_sources key: funded_wa ✅
URL: https://www.jobsandskills.wa.gov.au/course-list
Method: Puppeteer scrape of Drupal AJAX course list, then title-matched to qualifications table
Rows: 175 | Last ingested: 2026-03-24
Notes: 58 fee-free, 121 low-fee. 182 WA courses could not be matched (mostly skill sets not in our qual table). Jobs and Skills WA program.
TAS — Skills Tasmania¶
data_sources key: funded_tas ✅
URL: https://www.skills.tas.gov.au/providers/rto/courses_approved_and_funded_in_tasmania
Method: Excel download — two separate files (non-apprenticeship and apprenticeship/traineeship)
Rows: 420 | Last ingested: 2026-03-24
Notes: Excel file URLs include a date in the filename — check the TAS page for current URLs at each refresh. Previous URLs:
- Non-app: 19-RTOs-funded-to-deliver-qualifications-as-non-apprenticeship-and-traineeship-as-at-20-March-2026.xlsx
- App: 21-RTOs-funded-to-deliver-qualificatinos-as-an-apprenticeship-or-traineeship-as-at-20-March-2026.xlsx
NT — Fee-Free TAFE / User Choice (Charles Darwin University)¶
data_sources key: funded_nt ✅
URL: https://www.cdu.edu.au/courses?type=vet
Method: CDU course page scrape — qual code regex extraction from rendered HTML
Rows: 95 | Last ingested: 2026-03-24
Notes: NT data sourced from CDU VET catalogue only. CDU is the primary NT TAFE provider but not the only one — Batchelor Institute also delivers. Fee-free status not individually confirmed per qual. Marked as potentially incomplete.
ACT — ACT Skills¶
data_sources key: Not registered
Status: Not yet ingested
Notes: ACT Skills Fund approved training list is published but has not been ingested. Lower priority given market size. Should be added in the next state funding refresh cycle.
Update cadence and method for all state lists¶
Annual — most state lists update January–March following budget cycles. NSW and VIC publish mid-year updates. Each state is a different format and ingest method:
| State | Format | Automation potential |
|---|---|---|
| NSW | Excel (stable URL) | High — URL predictable |
| QLD | PDF (URL changes with version) | Medium — requires PDF parser |
| VIC | Excel (stable URL) | High — URL predictable, but full list not yet wired |
| SA | PDF (URL changes with version) | Medium — requires PDF parser |
| WA | Drupal AJAX scrape | Low — requires Puppeteer |
| TAS | Excel (URL includes date) | Medium — URL pattern predictable |
| NT | HTML scrape (CDU) | Low — rendered HTML, may break |
| ACT | Not yet implemented | — |
20. ABR Reference Codes¶
data_sources key: abr_reference ⚠️ Not yet registered
Authority: Australian Taxation Office / Australian Business Register
Rows: 167
What it is¶
Reference lookup table for ABR entity type codes (e.g. PRV = Australian Private Company, PUB = Australian Public Company) and industry classification codes used in ABN records. Used to interpret entity type fields on RTO records.
Update cadence¶
Infrequent — ABR classification codes change rarely. Re-ingest only when ATO publishes updated classification standards.
21. Unit Stats (Derived Aggregates)¶
data_sources key: unit_stats ✅
Not an external source — computed from qualification_units and rto_scope within the database
Last computed: 2026-03-24 | Records: 15,200
What it is¶
Pre-computed aggregate statistics written back as columns on the units table, making unit-level queries fast without joins:
| Column | Description |
|---|---|
stat_qual_count |
How many qualifications include this unit |
stat_core_count |
How many qualifications include this unit as a core unit |
stat_elective_count |
How many qualifications include this unit as an elective |
stat_rto_count |
How many RTOs deliver qualifications that include this unit |
Update method¶
Re-run after every TGA corpus ingest. Uses batch UPDATE via indexed temp tables. Script is part of the TGA adapter ingest pipeline.
Known Gaps and Issues¶
| Issue | Impact | Priority |
|---|---|---|
| VIC Skills First full list not ingested — only 7 rows | VIC funding intelligence severely incomplete | High |
| ACT not ingested | No ACT state_funding rows | Medium |
| VNDA coverage at 42% (494/1,167 current quals) | Employment/income data missing for 58% of quals | High — Brief #25 addressing |
| Per-qual enrolment/completion volumes missing | No individual qual market size data | High — Brief #25 adding |
9 sources not registered in data_sources |
Provenance gap — no refresh tracking | High — Brief #25 Part 1 fixing |
vnda_atlas contains incorrect values |
Risk of bad data reaching UI | High — Brief #25 Part 2 deprecating |
| No automated refresh for any data | All current data is one-time seeds from March 2026 | High — Brief #25 Part 4 adding cron Worker |
| NCVER VOCSTATS requires manual download session | Cannot be automated without NCVER API access | Medium |
| NT data incomplete (CDU only) | Batchelor Institute quals missing | Low |
Source Provenance: Unregistered Tables¶
The following tables exist in rtopacks-db with data but no entry in data_sources. Brief #25 Part 1 registers all of these.
| Table | Source to register | source_key |
|---|---|---|
recon_vnda, recon_vnda_aqf, recon_vnda_foe |
JSA VNDA OpenSearch API scrape | jsa_vnda_scrape |
vnda_atlas |
Unknown export — deprecated | jsa_vnda_export |
osl_ratings |
JSA Occupation Shortage List | jsa_osl |
ivi_vacancies |
JSA Internet Vacancy Index | jsa_ivi |
emp_projections |
JSA Employment Projections | jsa_emp_projections |
glmd_regional |
JSA GLMD regional JSON | jsa_glmd |
state_funding (VIC rows) |
VIC Free TAFE page | funded_vic |
abr_codes |
ATO ABR reference | abr_reference |
All vocstats_* tables |
NCVER VOCSTATS | ncver_vocstats |
jsa_qual_training (new — Brief #25) |
JSA Training OpenSearch API | jsa_training_api |