Back

Agent Browser Runtime

A local runtime that gives an AI agent two things it usually doesn't have: F12-grade evidence from a real browser, and request-level workflows in the spirit of Burp — intercept, replay, fuzz, out-of-band callback.

Why we built this

I kept handing my security-research agent a browser that could click and type, but couldn't show me the redirect chain, couldn't replay a request with a different header, couldn't tell me which exact frame leaked a token. So I built one that could.

The brief was narrow: give an agent F12 evidence (everything a human researcher would open DevTools to see) and an agentic Burp workflow (intercept, repeater, replay, raw socket, OOB collector). Not a vulnerability scanner. Not a sandbox. Not a Burp replacement — Burp is great at what it does, but its workflow lives inside a GUI an agent can't reach. ABR brings the workflow out into a service the agent can call.

Service is a literal description: ABR runs as a local HTTP server with a matching CLI. There is no MCP layer — I removed it after Wave-7 once the surface was cleaner without one. Agents POST to the worker directly, or pipe through agent-browser.

The shape

An agent doesn't directly face a hundred-something tools. The surface is layered so the first call is always one of nine, and each call returns concrete nextTools if it needs more. The drilldown is there when needed; it stays out of the way when it isn't.

Layer	What's at this layer	Role
1 · Facade	browser_open / browser_act / browser_inspect / browser_capture / browser_security_pack / browser_auth_boundary / browser_diff / browser_replay / browser_raw	Nine calls that cover 90% of workflows. Identical contract on both backends.
2 · Router	browser_inspect routes by focus: overview / network / storage / console / dom / sources / performance / search / evidence / debug	Returns a summary plus a concrete `nextTools` list — agent doesn't guess what to call next.
3 · Meta	browser_tool_catalog / browser_capability_map / browser_workflow_guide / browser_tool_help	For when an agent needs to introspect the surface itself rather than the page.
4 · Drilldown	~90 devtools_* tools across Network, Application, Sources, Console, Frames, Performance, Trace	Reached through facade or router — rarely first-touch.
5 · Escape hatch	browser_raw / browser_cdp_command	For protocol moves Chrome's facade can't reach. Power tools.

On both backends — Managed Browser (clean Playwright + direct CDP) and Personal Chrome (extension + chrome.debugger) — the nine facade calls do the same thing. Managed exposes an additional ~134 durable-profile and CDP-introspection tools that only make sense when the agent owns the profile. Personal Chrome, attached to a tab the user already has open, still reaches roughly ninety F12 evidence surfaces through browser_raw or extension aliases.

What an agent can actually do

Eight families, all addressable through the same nine-call facade. The CLI names mirror the HTTP tool names so anything in the table runs identically over either surface.

Family	What an agent does with it
browser_open / act / wait	Drive the page the way a human would — each call returns a structured page summary, not a screenshot the agent has to look at.
browser_capture (start / stop / status)	Spool every request, response body, console entry, redirect, frame load to disk for the duration of the profile.
browser_inspect (10 focus areas)	Pull a focused summary — network, storage, console, sources, etc. Returns concrete `nextTools` for drill-in.
browser_security_pack	One call: navigate → capture → drill → bundle. Returns ~25 artifacts (HAR, application export, trace, evidence bundle) plus a handoff JSON.
browser_replay / browser_diff	Replay a captured request with mutations — different headers, swapped JWT, body changes. Diff storage / network between two captures.
browser_raw (escape hatch)	Reach any `devtools_*` tool directly. Open a raw TCP/TLS socket for protocols Chrome's fetch can't (smuggling, h2 desync, malformed).
profile_oob_alloc / oob_poll	Mint a unique callback URL, plant it in a payload, poll for hits. Catches blind SSRF / SSTI / XXE / log4j-style vulns.
browser_scan_bridge / bola / status	Bridge captured traffic into an API response corpus; run automated horizontal-authorization (BOLA) probes; check scan output.

The shortest useful interaction is one CLI call:

$ agent-browser pack https://example.com --profile demo

# navigate → capture network/storage/console/sources for ~10s → write a bundle
# returns artifact paths the agent can hand to a separate reasoner
{
  "summary":   { "requests": 7, "consoleErrors": 0, "redirects": 1, "artifacts": 25 },
  "har":       "~/.agent-browser-runtime/profiles/demo/har/2026-06-17-pack.har",
  "trace":     "~/.agent-browser-runtime/profiles/demo/traces/<id>.zip",
  "evidence":  "~/.agent-browser-runtime/profiles/demo/bundles/<id>.json",
  "next":      ["profile_request_detail", "browser_evidence_timeline", "..."]
}

The same workflow over HTTP is one POST to /browser/security_pack. Same JSON back.

vs the obvious alternatives

If you'd reach for	You'd get	What's different here
Playwright	A script library. You write the harness.	This is a running service. An agent POSTs and gets evidence back.
Chrome DevTools MCP	A thin wrapper over DevTools commands.	Workflows sit above the commands — capture, pack, replay, profile-scoped state, OOB collector.
Burp Suite	The reference for request-level work — but it lives inside a GUI.	Same mental model, reachable from an agent loop. The two complement; this isn't a replacement.

Solid · Rough · Honest

I'd rather call out what's done from what isn't than ship the polish I haven't done yet.

Where it's solid

538 unit tests + smoke suite; CI gates a 3-OS matrix on every commit
TypeScript strict mode, zero compile errors, zero TODO / FIXME / debugger residue
Contract test prevents tool-surface drift between the two backends
Release-readiness gate scans for personal paths / internal hostnames / leaked secrets
Bearer-token auth is timing-safe; DNS rebinding protection; default bind 127.0.0.1
Evidence writes are atomic (tmp + fsync + rename, under mutex)
Path-traversal protection on every artifact read
SECURITY.md, CONTRIBUTING.md, issue templates, MIT license, real CHANGELOG

Where it's still rough

Two largest files (worker server: ~6 KLOC, extension bridge: ~4.7 KLOC) cover themselves through smoke tests rather than unit tests
Personal Chrome bridge ships 9 facade tools + ~90 reachable evidence surfaces, but the symmetric drilldown layer isn't fully exposed yet (basic-action tools like type / hover / drag are on the roadmap to Personal)
Critical-path unit tests (token-auth gate, WebSocket frame cap, profile lock under race) are still on the to-do list — covered by smoke today
No published npm package yet — install from source today; npm publish is queued behind one files[] cleanup

What it is not: a scanner, a sandbox, a misuse-resistant tool. It can see cookies, bearer tokens, response bodies, WebSocket frames, account-specific storage. Use it on profiles, accounts, and targets you are authorized to inspect — the repo enforces this through profile naming conventions, gitignored evidence directories, and prompt guidance.

Source

MIT. Single CLI. 30-second public demo against example.com so the first run never touches private cookies.

github.com/ttcd77/agent-browser-runtime