A multi-agent software engineering framework where architecture is decided before a single line of implementation is written. Agents implement independently, in parallel, even competitively — with no way to ship code that doesn't honor its contract.
pip install pact-agents
Every multi-agent coding framework has the same problem: how do you know the code is right? Advisory coordination doesn't work. "Looks good to me" doesn't scale.
Pact's answer: make the tests first, make them mechanical, and let agents iterate until they pass. No negotiation. No review boards. Pass or fail.
Code is cheap — agents generate it in minutes. Contracts are expensive — they encode hard-won understanding of what the system actually needs to do. Pact makes that inversion explicit.
"When a module fails in production, the response isn't 'debug the implementation.' It's: add a test that reproduces the failure, flush the implementation, and let an agent rebuild it. The contract got stricter. The next implementation can't have that bug."
Contracts accumulate the scar tissue of every production incident. They become the real engineering artifact.
No human judgment in the loop at verification time. The pipeline either passes or it doesn't.
When agents write the code, the contracts, tests, and verification become the real product.
Tasks decompose into 2-7 components. Each gets a typed interface contract and executable tests before any implementation begins.
Independent components implement concurrently. No agent waits on another. Semaphore-limited concurrency keeps costs predictable.
N agents race on the same component. Best implementation wins — scored by test pass rate and execution time. The contract is the judge.
Generate contracts, stubs, and implementations in Python, TypeScript, or plain JavaScript ES6 modules with JSDoc. Vitest for TS/JS, pytest for Python.
Route different roles to different LLMs. Opus for architecture, Sonnet for tests, GPT-4o for implementation. Anthropic, OpenAI, and Gemini supported.
Per-project spend tracking with content-aware token estimation. Set a budget, let agents work. Multi-window caps prevent runaway costs.
Stop after contracts and tests. Review the architecture. Then target specific components with pact build. Full control over what gets built and when.
Run pact audit after implementation to verify every requirement in your spec is covered. Get a gap report showing what's covered, partial, or missing.
The pipeline monitors its own coordination health. Detects the $50-planning-zero-output pattern, cascade failures, and budget stalls. Proposes remedies — you decide.
When health degrades, Pact pauses and proposes fixes via FIFO directive. No silent config changes. The system never reduces its own degrees of freedom without asking.
Dependency-driven fan-out. Each component advances through its own phase pipeline as soon as deps are satisfied. No phase-locked waiting.
Static prompt prefixes cached across API calls. 50-70% input token savings. Cache hit rates tracked in budget metrics. Research results persisted and reused across phases.
Goodhart tests: adversarial hidden tests the agent never sees. Catches hardcoded returns, missing validation, and invariants that hold only for visible inputs. Graduated-disclosure remediation on failure.
SHA256 baselines for contracts, tests, and implementations. Detects when artifacts change without version bumps. Staleness tracking classifies components as fresh, aging, or stale.
Post-run analysis: cost distribution, failure patterns, largest test suites, actionable lessons. Each run gets smarter from the last.
Anti-cliche enforcement flags vague contract language. Typed side-effect declarations. Optional performance budgets with p95 latency and Big-O constraints.
Contracts define data structures with domain-specific validators — range, regex, length, custom rules. Tests verify acceptance and rejection. Implementations render as Pydantic models, Zod schemas, or validated constructors. Not every field needs a validator — only those with domain semantics worth encoding.
Transient errors retry with backoff. Systemic failures pause with actionable recommendations. pact resume recovers from any failure without manual state editing.
Establishes the cognitive mode (rigorous-analytical, exploratory-generative, etc.) before any domain content. Contracts carry the register. Handoff protocol primes it. Health system monitors drift.
Checks that composed contracts actually fulfill the original task. Extracts action verbs from your spec and verifies coverage. Catches "all tests pass but the system can't do anything."
pact handoff renders and validates what each agent actually sees. Check context fences, primer ordering, token budgets, and dependency coverage. Debug coordination at the prompt level.
Built-in MCP server for Claude Code integration. Inspect status, validate contracts, check budgets, and resume runs — all from your editor. 7 tools, 5 resources, stdio transport. pip install pact-agents[mcp]
Mechanical smoke tests from AST analysis — no LLM required. pact adopt extracts every public module-level function signature and generates import + callable checks in tests/smoke/. Filters out methods, private functions, and nested functions.
Mechanical codebase analysis for structural friction — no LLM required. pact assess detects hub dependencies, shallow modules, tight coupling (mutual imports + SCCs), scattered logic, and test coverage gaps. Uses Python ast and Tarjan's algorithm. Point it at any Python directory — no project setup needed.
All project knowledge lives in the project tree — contracts, source, tests, decomposition, Goodhart tests, standards, learnings. When a teammate clones the repo, they see everything. .pact/ contains only ephemeral per-run state that gets regenerated.
Optional enrichment from ctags (symbol index), cscope (call graph), tree-sitter (full CST, error-tolerant, cross-language), and kindex (knowledge graph). Agents get richer context about class hierarchies, callers, and existing project knowledge. All tools optional — graceful degradation. pip install pact-agents[analysis]
5 problems, 212 test cases, Claude Opus 4.6. Pact scores 212/212 (100%) on problems where single-agent Claude Code tops out at 79% (single-shot) or 92% (iterative with 5 retries).
The hardest problem — Trailing Digits (2020 World Finals) — requires O(log n) number theory. Claude Code gets 31/47 even with full test feedback and 5 iterations. Pact's contract pipeline forces the math-first approach and solves it on the first attempt.
| Condition | Pass Rate | Cost |
|---|---|---|
| Claude Code (single-shot) | 79% | $0.60 |
| Claude Code (iterative, 5x) | 92% | $1.26 |
| Pact | 100% | ~$13 |
5 ICPC World Finals problems · 212 test cases · Claude Opus 4.6
Pact is the contract-first build system. It integrates with four companion tools — all optional, all additive.
Constrain seeds decomposition with policies and component maps. Arbiter gates deployments with blast radius analysis and trust scoring. Ledger injects field-level audit assertions into test suites. Sentinel watches production, attributes errors via PACT keys, and pushes tightened contracts back.
Every implementation emits structured events with classification metadata. The access_graph.json artifact captures what each component reads, writes, and owns — consumed by Arbiter at phase 8.5 before code ships.
Python 3.12+, two dependencies. pip install pact-agents and go.
Generates Python, TypeScript, or JavaScript.
Describe your task in task.md, set your standards in sops.md,
and let Pact decompose, contract, and build.
# Install from PyPI
pip install pact-agents
# Or with all LLM backends
pip install pact-agents[all-backends]
# With code analysis tools (tree-sitter)
pip install pact-agents[analysis]
# Create a project
pact init my-project
# Edit my-project/task.md with your task
# Edit my-project/sops.md with your standards
# Run the pipeline
pact run my-project
# Or go step by step
pact run my-project --plan-only
pact components my-project
pact build my-project auth_module
Pact is one of three systems built to test the ideas in Beyond Code: Context, Constraints, and the New Craft of Software. The book covers the coordination, verification, and specification problems that motivated Pact's design.
Read the BookPact produces components with provable interfaces. Baton wires them into a running service topology with mock collapse, A/B routing, health monitoring, and self-healing.
Start fully mocked, gradually swap to live implementations, canary new versions with weighted routing, and let the custodian auto-recover failures. Together, Pact and Baton cover the full lifecycle from architecture to production.
"Pact builds the pieces with typed contracts. Baton runs them as a circuit — pre-wired topology, smart adapters, mock collapse, and self-healing. Architecture decisions stay enforced from design through production."
Circuit-first orchestration for contract-first components.