Victoria Crash Narratives — De-identified

Real Victorian police crash narratives, de-identified end-to-end by a fully local AI pipeline — no raw record ever left the source machine — then mapped to their crash locations from the Victorian open road-crash dataset. Click any point to read the de-identified narrative and its DCA classification.

— crashes mapped100% locally de-identifiedDCA-classifiedSource: Victorian Road Crash Data (CC-BY 4.0)

What is a DCA code — and why it matters

Every crash on this map carries a DCA code (Definitions for Coding Accidents) — Victoria's standard classification of crash configuration: who was involved, from which direction, and what movement led to impact. Around one hundred codes cover everything from DCA 100 (pedestrian struck from the near side) to DCA 130 (rear-end — the single most common crash type, about 18% of all Victorian crashes).

The DCA code is the bridge between the human story and structured analysis. It is assigned by reading exactly the kind of police narrative shown on this map — the free-text account is the ground truth, and the DCA code is its structured shadow. That classification is what powers blackspot programs, trend detection and countermeasure selection across the state.

Reading the narrative alongside its DCA code shows both the strength and the limits of classification: one short code, chosen from a rich, messy human account.

How these narratives were de-identified

The original police narratives contain names, registration plates, phone numbers, licence numbers, birth dates and addresses. Before anything reached this page, every record passed through a fully local de-identification pipeline running on-premises — no narrative left the source machine in raw form.

  • Two independent local-LLM extraction passes find personal information in context;
  • Deterministic layers catch the formal identifiers — plates (including letter-only fleet plates), phones, licence numbers, birth dates;
  • A fail-closed gate re-scans every output: anything suspicious is escalated to human review, and an in-context LLM adjudicator distinguishes real names from ordinary words — critical in ALL-CAPS police prose;
  • Crash facts that matter for road safety — locations, roads, vehicle types, speeds, the sequence of events — are deliberately preserved.
  • What you read here is the analytic substance of each crash with the people taken out of it.