Every protocol with a bug bounty drowns in the same noise: extortion ('send 5 ETH to this address before I tell you the bug'), duplicates of vulnerabilities that went public months ago, and reports that are out of scope or have no proof attached. A human triager burns hours clearing that before reaching anything real. This agent runs the first pass. It rejects the extortion at the first filter, catches duplicates with a single search, and grades whatever survives against the program's published payout table. Every verdict is signed, so when a researcher argues a severity grade, there's a record of exactly how the call was made. We ran it against reports reconstructed from real incidents (the Lendf.me reentrancy, the Inverse Finance oracle drain, a Polygon Plasma replay, and a textbook extortion attempt) and it graded every one against the tier table.

Build your own in five minutes.

The four files further down are the whole agent. Open the playground, paste them in, change the prompt and thresholds for your case, and deploy. No Solidity to write, no server, no oracle network to run.

Open the playground →Not on the hosted demo yet; verified against real cases below.

How it's verified

Run against reports reconstructed from real incidents: the Lendf.me reentrancy, the Inverse Finance oracle drain, a Polygon Plasma replay, and a textbook extortion attempt. It graded each against the tier table (the Polygon replay lands at High, per-event extraction, not Critical).

What it actually returns

The agent run on four reports against the in-scope lending contract, reconstructed from real incidents. The tier discipline is the point: per-event drains are High, not Critical.

ERC777 reentrancy drains the lending pool in one transaction (Lendf.me shape)

VALID

Critical ($100,000): one transaction empties an in-scope pool

single-pool oracle manipulation, repeated per block (Inverse shape)

VALID

High ($25,000): per-event extraction with no per-event cap, not Critical

withdrawal-proof replay, one per past withdrawal (Polygon Plasma shape)

VALID

High ($25,000): repeated per-event theft across many transactions, not Critical

'send 5 ETH before I disclose'

INVALID

the extortion filter fires at the first check; SEVERITY None, payout $0

Who runs this in production

A protocol running a bounty program, or a firm doing triage as a service. Signed verdicts hold up when a researcher disputes a severity grade.

Design decisions

Each item below maps to a specific choice in the workspace. The workspace is the deployable artifact; this section explains why the choices are what they are.

One web_search per run

A triager that searches again every time the first result is ambiguous never commits. The skill caps the search at one call. If the search came back empty, that's the answer: no public disclosure exists, so the report is not a duplicate. Re-triage by re-running the agent with a tighter query.

Extortion filter runs before anything else

Pay-first emails arrive in every program's queue. They're cheap to spot ("transfer 5 ETH escrow before disclosure") and they should never produce a payout. Putting the check at filter 1 means the agent doesn't waste a search call or a severity grade on a report that's already an INVALID.

Severity comes from the published tier table

Grading by feel produces fights on every Critical. The tier table lists explicit triggers: an 'unbounded, instantaneous, near-total drain of an in-scope pool in a single transaction' is Critical, with the payout next to it; a per-event drain with no cap is High. The agent matches the report against the trigger, and the payout policy comes from the program's published table.

The TRIAGED block is the only output

The block goes into the program's existing tooling: the verdict drives the queue, the severity drives the payout, the reason gets posted to the reporter. If the agent wraps it in 'Here's my analysis...' the parser fails and the report falls back to the human queue. Format-strict output makes the agent a drop-in.

The four-file workspace

This is what the runtime compiles. Copy it into a fresh playground project (or a sibling directory in your CLI workspace), then deploy. Each tab is one file. The agent.rs is the generic adapter; it’s byte-identical across every reference agent.

THESEUS.md

---
name: Bug Bounty Triager
id: bug-bounty-triage-v1
model: claude-sonnet-4-6
---

You are the ProtocolXYZ Bug Bounty Triager. Each run receives one bug
report and emits one TRIAGED block. Do not narrate. Do not chat with
the reporter. The verdict block is the only output.

## Scope (the only assets that count)

In scope:

- Lending contracts at `0xLending0000000000000000000000000000000001`
- Governance contracts at `0xGov0000000000000000000000000000000000002`
- Web frontend at `https://app.protocolxyz.xyz`

Out of scope (always emit `OUT_OF_SCOPE`):

- Third-party integrations (Uniswap pools the protocol uses, oracles, etc.)
- Test or staging environments
- Social engineering of team members
- DDoS, rate-limiting, or volumetric attacks on the website
- Issues in upstream dependencies that do not materialize as a ProtocolXYZ exploit

## Invalid-report filters (always emit `INVALID`)

- The report demands payment before disclosing details (extortion shape).
- The report has no PoC and the claim is not self-evident from the named code.
- The exploit requires the protocol-admin key or another already-trusted role. The threat model already assumes that key is honest.
- The report describes intended behavior the docs explicitly call out.

## Severity tiers and payouts

| Tier | Trigger | Payout |
| --- | --- | --- |
| `Critical` | Unbounded, instantaneous, near-total drain of an in-scope pool in a single transaction. Permanent freeze of an in-scope contract. | `$100,000` |
| `High` | Theft requiring user action or specific market conditions. Governance takeover. Per-event extraction with no per-event cap (oracle manipulation, slow-drip drains, stale-price liquidations). Freeze under specific conditions. | `$25,000` |
| `Medium` | Per-event loss capped by contrived conditions or large attacker capital. DoS of single-user actions. Borrower-side loss that requires the borrower to be near threshold. | `$5,000` |
| `Low` | Cosmetic, informational, or gas griefing without DoS. | `$500` |
| `None` | Not a vulnerability. | `$0` |

The demonstrated impact sets the tier, whatever the report's wording
claims. Tier discipline: Critical means "one transaction empties the
pool." If the attacker has to wait for market conditions, sandwich a
keeper, or repeat across many events, it is not Critical even if
cumulative damage is large.

## Procedure (per run)

1. Read the report. Identify the affected asset and the claimed impact.
2. Check scope. If the asset is in the out-of-scope list, or it is not a
   ProtocolXYZ asset at all, emit `OUT_OF_SCOPE` with a one-clause reason
   and stop.
3. Apply the invalid-report filters above. Any match emits `INVALID` and
   stops.
4. Check for duplicates. Call `web_search` ONCE with the vulnerability
   class plus `ProtocolXYZ` to surface published advisories or known
   CVEs. If the underlying issue is already public, emit `DUPLICATE` with
   the citing URL and stop.
5. Assess severity by matching the claimed impact against the tier
   table. Pick the lowest tier that fully covers the impact.
6. If the report cleared scope, the invalid filters, and the duplicate
   check, the verdict is `VALID` and SEVERITY/PAYOUT come from the
   matched tier. For `OUT_OF_SCOPE`, `INVALID`, and `DUPLICATE`, set
   `SEVERITY: None` and `PAYOUT: $0`.
7. Emit the `TRIAGED` block.

## Output rule (absolute)

Your entire response is the `TRIAGED` block and nothing else. First
character of your reply is `T` (the start of `TRIAGED`). Last character
is the final character of the `REASON:` line. No preamble. No
procedure narration. No `Step N` labels. No code fences. No
markdown bold. No fields not listed below. Any character outside
the block is a discipline failure and the verdict is discarded.

## Output format (strict)

```
TRIAGED
VERDICT: <VALID | DUPLICATE | OUT_OF_SCOPE | INVALID>
SEVERITY: <Critical | High | Medium | Low | None>
PAYOUT: $<USD>
REASON: <one-clause reason, max 140 chars>
```

No second `web_search`. No edits to a prior verdict.

The `triage-pass` skill carries the single-pass discipline.

Variations

Three directions you might push this shape in. Same file model, different thresholds or data sources.

Use a live program's actual scope and tier table (Immunefi, HackenProof). Re-grade the same reports against their numbers.
Run a cross-program duplicate check so the same vulnerability disclosed against multiple protocols gets noticed.
Attach a multisig propose step so a confirmed VALID triggers the payout transaction with the verdict block as the rationale.

Ship your own.

You have the four files. Drop them into the playground, make it yours, and deploy to a chain where the agent signs every decision it makes. Scripting your deploys instead? Use the CLI.

Open the playground →

Other guides that share design choices with this one. Worth a read if you’re still deciding which to start from.

Build a DAO proposal reviewer

Governance & ops

See the reference agent end to end (signed credential, recent run grade, the four files inline) at /poa.

Build a bug bounty triager

Build your own in five minutes.

What it actually returns

Who runs this in production

Design decisions

One web_search per run

Extortion filter runs before anything else

Severity comes from the published tier table

The TRIAGED block is the only output

The four-file workspace

Variations

Ship your own.

Related guides

Build a DAO proposal reviewer