webprobe

Map any website as a directed graph. Capture metrics. Scan for vulnerabilities. Explore with AI agents. Report everything.

GitHub PyPI

# Install and run in 30 seconds
pip install webprobe
playwright install chromium
webprobe run https://your-site.com

Five Phases

Phase 1

Map

BFS crawl via robots.txt, sitemaps, and link following. Two passes: anonymous and authenticated. Optional framework route detection for Astro, Next.js, SvelteKit.

Phase 2

Capture

Playwright visits every page. Records timing, HTTP status, all subresources, response headers, cookies, console messages, forms, and full-page screenshots.

Phase 3

Analyze

Graph metrics, cyclomatic complexity, broken links, auth boundary violations, timing outliers, prime path enumeration, and passive security scanning.

Phase 4

Report

Stable JSON schema for aggregation and trending. Dark-themed HTML with summary cards, sortable tables, and per-node details with expandable sections.

Phase 5

Explore

LLM-driven agents with headless browsers. WCAG contrast checking, hidden element detection, vision analysis, and interactive form testing. Cost-tracked.

Security Scanning

Passive checks -- no attack payloads, safe for production

Headers

HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy

XSS

Missing CSP, unsafe-inline/eval, reflected parameters

Cookies

Secure, HttpOnly, SameSite flags

Mixed Content

HTTP resources on HTTPS pages

CORS

Wildcard origins, credentials exposure

Info Disclosure

Server versions, source maps, stack traces

Forms

CSRF tokens, password autocomplete

Accessibility

WCAG AA/AAA contrast ratios

Built for CI/CD

Stable Schema

Versioned JSON output. Diff any two runs. Aggregate across hundreds. Break your pipeline on regressions.

Multi-Provider LLM

Claude, OpenAI, Gemini, or Apprentice for local routing. Configurable per run. Cost tracked to the token.

Finding Masks

YAML rules to suppress known issues by URL pattern, title, or category. Suppressed findings are tracked, not deleted.

Framework Aware

Detects Astro, Next.js, and SvelteKit routes from your project root. Supplements crawl with routes the spider can't reach.

Usage

# Full mechanical scan
webprobe run https://your-site.com

# With framework route detection
webprobe run https://your-site.com --project-root ./my-project

# JS-rendered sites (uses Playwright for mapping)
webprobe run https://your-site.com --js

# With LLM exploration
webprobe run https://your-site.com --explore --agents 5

# Individual phases
webprobe map https://your-site.com
webprobe capture ./webprobe-runs/run-id/
webprobe analyze ./webprobe-runs/run-id/

# Compare two runs
webprobe diff ./run-a/ ./run-b/

# Different LLM provider
webprobe explore ./run-id/ --provider openai --model gpt-4o