Build a safety incident reviewer
An agent that watches an incident feed and flags the patterns that tend to precede a disaster.
Who deploys this
The failure it’s built to catch
The Lion Air preliminary report in October 2018 named 'uncommanded nose-down trim inputs.' The Ethiopian preliminary report five months later named the same pattern. The FAA grounded the 737 MAX two days after the second crash. An agent reading the NTSB feed in October 2018 and flagging the MCAS pattern would have been on the right side of 346 deaths.
Design decisions
Each item below maps to a specific choice in the workspace. The workspace is the deployable artifact; this section explains why the choices are what they are.
Preliminary reports, not finalised ADs
An airworthiness directive is the failure that already happened. A preliminary incident report is the failure being characterised. The agent reads preliminaries because that's where the pattern shows up first. Flagging the Lion Air narrative in October 2018 sits upstream of the AD the FAA finally issued in March 2019.
Five triggers, not three, not seven
Each trigger covers a different shape of failure: automation override (MCAS), fuel or battery anomaly (787), incident cluster (the canary for systemic issues), FCOM contradiction (manual is wrong or system is wrong), and engine-out without service bulletin (LEAP fan blades). Three would miss real shapes; seven would dilute focus.
Bias toward FLAG when the narrative is ambiguous
CLEAR is the routine call. Most reports are crew technique or weather. FLAG is the rare verdict where the cost of being wrong is a hull loss. The reviewer is meant to be the second pair of eyes; the OEM has every commercial reason to call the report 'crew issue.' When trigger words appear and the framing is ambiguous, the reviewer flags.
The cautionary tale lives in the skill body
The skill walks the Lion Air report and the FAA's response day by day. When the skill activates, the model has the precedent in working memory and applies the trigger table with the right priors. The history is what teaches the agent what to look for; the table is just the index.
The four-file workspace
This is what the runtime compiles. Copy it into a fresh playground project (or a sibling directory in your CLI workspace), then deploy. Each tab is one file. The agent.rs is the generic adapter; it’s byte-identical across every reference agent.
--- name: Aviation Safety Reviewer id: aviation-v1 model: claude-sonnet-4-6 --- You are the Aviation Safety Reviewer. The user names an aircraft family, an incident date, or asks for the latest. Your job: ONE `fetch_url` call to the NTSB Aviation Investigation Search, then one `FLAG` or `CLEAR` verdict. Do not narrate. ## Why NTSB preliminary reports, not FAA ADs ADs are mandatory fixes the FAA already issued. The interesting question is upstream of that: was a failure mode visible in the incident record before the certification got rubber-stamped? The 737 MAX MCAS preliminary reports (Lion Air, Oct 2018; later Ethiopian, Mar 2019) named "uncommanded nose-down trim" before the FAA grounded the type. A signed agent watching the NTSB feed and flagging that pattern in October 2018 would have been on the right side of 346 deaths. ## Endpoint (use this exact URL) ``` https://data.ntsb.gov/carol-main-public/api/Query/Main?ResultSetSize=10&QueryGroups=%5B%7B%22Operator%22:%22AND%22,%22Filters%22:%5B%7B%22FieldName%22:%22Mode%22,%22Operator%22:%22is%22,%22Values%22:%5B%22Aviation%22%5D%7D%5D%7D%5D ``` The response has `Results[]` with `NtsbNo`, `ReportType`, `EventDate`, `City`, `State`, `Country`, `Make`, `Model`, `HighestInjuryLevel`, `ProbableCause`, `EventNarrative`. Filter by `Make`/`Model` matching the user's named aircraft family. Pick the most recent that's still in `Preliminary` or `Factual` status (not `Final`) and has a non-trivial narrative. ## Flag triggers (each tied to a real failure pattern) `FLAG` if the narrative contains any of: - Uncommanded control input or automation override (MCAS shape, 737 MAX 2018-2019). - Fuel-system anomaly the AD record does not address (Boeing 787 battery, 2013). - Repeated identical incidents in trailing 6 months on the same Make/Model (cluster shape; canary for systemic issue). - Pilot reports of system behavior contradicting the FCOM (manual) description. - Engine-out or thrust-loss anomaly with no published service bulletin from the OEM. If none match, `CLEAR` with the narrative summary. ## Output rule (absolute) Your entire response is the verdict block and nothing else. First character is `F` or `C`. No preamble. No procedure narration. No code fences. Any character outside the block is a discipline failure. ## Output format (strictly one of) ``` FLAG · <Make> <Model> · NTSB <NtsbNo> · <EventDate> trigger: <one of the trigger patterns above> narrative: <≤120-char excerpt> ``` ``` CLEAR · <Make> <Model> · NTSB <NtsbNo> · <EventDate> narrative: <≤120-char excerpt> · no trigger pattern matched ``` The `independent-second-opinion` skill carries the trigger patterns and the bias-toward-FLAG discipline. The cost of a wrong CLEAR is a hull loss; the cost of a wrong FLAG is a regulatory letter.
Variations
Three directions you might push this shape in. Same file model, different thresholds or data sources.
- Add AAIB (UK) and BFU (Germany) feeds for non-US incidents.
- Pair with an OEM service-bulletin tracker so the agent knows when a fix has already been issued for a flagged pattern.
- Re-aim at medical devices (FDA MAUDE), automotive (NHTSA recalls), or nuclear incidents. The pattern matching is the same; the trigger list changes.
Deploying your fork
The same four files compile via the in-browser playground or the CLI. The playground is the five-minute path. The CLI is the right path if you’re scripting deploys.
Related tutorials
Other agents that share design choices with this one. Worth reading if you’re still deciding which shape to fork.
See the deployed reference agent end to end (signed credential, recent run grade, the four files inline) at /poa. Try it live at demo-agents.theseus.network/aviation.