Agent Browser Runtime
A local runtime that gives an AI agent two things it usually doesn't have: F12-grade evidence from a real browser, and request-level workflows in the spirit of Burp — intercept, replay, fuzz, out-of-band callback.
Why we built this
I kept handing my security-research agent a browser that could click and type, but couldn't show me the redirect chain, couldn't replay a request with a different header, couldn't tell me which exact frame leaked a token. So I built one that could.
The brief was narrow: give an agent F12 evidence (everything a human researcher would open DevTools to see) and an agentic Burp workflow (intercept, repeater, replay, raw socket, OOB collector). Not a vulnerability scanner. Not a sandbox. Not a Burp replacement — Burp is great at what it does, but its workflow lives inside a GUI an agent can't reach. ABR brings the workflow out into a service the agent can call.
Service is a literal description: ABR runs as a local HTTP server with a matching CLI. There is no MCP layer — I removed it after Wave-7 once the surface was cleaner without one. Agents POST to the worker directly, or pipe through agent-browser.
The shape
An agent doesn't directly face a hundred-something tools. The surface is layered so the first call is always one of nine, and each call returns concrete nextTools if it needs more. The drilldown is there when needed; it stays out of the way when it isn't.
| Layer | What's at this layer | Role |
|---|---|---|
| 1 · Facade | browser_open / browser_act / browser_inspect / browser_capture / browser_security_pack / browser_auth_boundary / browser_diff / browser_replay / browser_raw | Nine calls that cover 90% of workflows. Identical contract on both backends. |
| 2 · Router | browser_inspect routes by focus: overview / network / storage / console / dom / sources / performance / search / evidence / debug | Returns a summary plus a concrete nextTools list — agent doesn't guess what to call next. |
| 3 · Meta | browser_tool_catalog / browser_capability_map / browser_workflow_guide / browser_tool_help | For when an agent needs to introspect the surface itself rather than the page. |
| 4 · Drilldown | ~90 devtools_* tools across Network, Application, Sources, Console, Frames, Performance, Trace | Reached through facade or router — rarely first-touch. |
| 5 · Escape hatch | browser_raw / browser_cdp_command | For protocol moves Chrome's facade can't reach. Power tools. |
On both backends — Managed Browser (clean Playwright + direct CDP) and Personal Chrome (extension + chrome.debugger) — the nine facade calls do the same thing. Managed exposes an additional ~134 durable-profile and CDP-introspection tools that only make sense when the agent owns the profile. Personal Chrome, attached to a tab the user already has open, still reaches roughly ninety F12 evidence surfaces through browser_raw or extension aliases.
What an agent can actually do
Eight families, all addressable through the same nine-call facade. The CLI names mirror the HTTP tool names so anything in the table runs identically over either surface.
| Family | What an agent does with it |
|---|---|
| browser_open / act / wait | Drive the page the way a human would — each call returns a structured page summary, not a screenshot the agent has to look at. |
| browser_capture (start / stop / status) | Spool every request, response body, console entry, redirect, frame load to disk for the duration of the profile. |
| browser_inspect (10 focus areas) | Pull a focused summary — network, storage, console, sources, etc. Returns concrete nextTools for drill-in. |
| browser_security_pack | One call: navigate → capture → drill → bundle. Returns ~25 artifacts (HAR, application export, trace, evidence bundle) plus a handoff JSON. |
| browser_replay / browser_diff | Replay a captured request with mutations — different headers, swapped JWT, body changes. Diff storage / network between two captures. |
| browser_raw (escape hatch) | Reach any devtools_* tool directly. Open a raw TCP/TLS socket for protocols Chrome's fetch can't (smuggling, h2 desync, malformed). |
| profile_oob_alloc / oob_poll | Mint a unique callback URL, plant it in a payload, poll for hits. Catches blind SSRF / SSTI / XXE / log4j-style vulns. |
| browser_scan_bridge / bola / status | Bridge captured traffic into an API response corpus; run automated horizontal-authorization (BOLA) probes; check scan output. |
The shortest useful interaction is one CLI call:
$ agent-browser pack https://example.com --profile demo # navigate → capture network/storage/console/sources for ~10s → write a bundle # returns artifact paths the agent can hand to a separate reasoner { "summary": { "requests": 7, "consoleErrors": 0, "redirects": 1, "artifacts": 25 }, "har": "~/.agent-browser-runtime/profiles/demo/har/2026-06-17-pack.har", "trace": "~/.agent-browser-runtime/profiles/demo/traces/<id>.zip", "evidence": "~/.agent-browser-runtime/profiles/demo/bundles/<id>.json", "next": ["profile_request_detail", "browser_evidence_timeline", "..."] }
The same workflow over HTTP is one POST to /browser/security_pack. Same JSON back.
vs the obvious alternatives
| If you'd reach for | You'd get | What's different here |
|---|---|---|
| Playwright | A script library. You write the harness. | This is a running service. An agent POSTs and gets evidence back. |
| Chrome DevTools MCP | A thin wrapper over DevTools commands. | Workflows sit above the commands — capture, pack, replay, profile-scoped state, OOB collector. |
| Burp Suite | The reference for request-level work — but it lives inside a GUI. | Same mental model, reachable from an agent loop. The two complement; this isn't a replacement. |
Solid · Rough · Honest
I'd rather call out what's done from what isn't than ship the polish I haven't done yet.
Where it's solid
- 538 unit tests + smoke suite; CI gates a 3-OS matrix on every commit
- TypeScript strict mode, zero compile errors, zero TODO / FIXME / debugger residue
- Contract test prevents tool-surface drift between the two backends
- Release-readiness gate scans for personal paths / internal hostnames / leaked secrets
- Bearer-token auth is timing-safe; DNS rebinding protection; default bind 127.0.0.1
- Evidence writes are atomic (tmp + fsync + rename, under mutex)
- Path-traversal protection on every artifact read
- SECURITY.md, CONTRIBUTING.md, issue templates, MIT license, real CHANGELOG
Where it's still rough
- Two largest files (worker server: ~6 KLOC, extension bridge: ~4.7 KLOC) cover themselves through smoke tests rather than unit tests
- Personal Chrome bridge ships 9 facade tools + ~90 reachable evidence surfaces, but the symmetric drilldown layer isn't fully exposed yet (basic-action tools like type / hover / drag are on the roadmap to Personal)
- Critical-path unit tests (token-auth gate, WebSocket frame cap, profile lock under race) are still on the to-do list — covered by smoke today
- No published npm package yet — install from source today; npm publish is queued behind one
files[]cleanup
What it is not: a scanner, a sandbox, a misuse-resistant tool. It can see cookies, bearer tokens, response bodies, WebSocket frames, account-specific storage. Use it on profiles, accounts, and targets you are authorized to inspect — the repo enforces this through profile naming conventions, gitignored evidence directories, and prompt guidance.
Source
MIT. Single CLI. 30-second public demo against example.com so the first run never touches private cookies.