Contracts before code.
Tests as law.

A multi-agent software engineering framework where architecture is decided before a single line of implementation is written. Agents implement independently, in parallel, even competitively — with no way to ship code that doesn't honor its contract.

pip install pact-agents

LLMs are unreliable reviewers.
Tests are perfectly reliable judges.

Every multi-agent coding framework has the same problem: how do you know the code is right? Advisory coordination doesn't work. "Looks good to me" doesn't scale.

Pact's answer: make the tests first, make them mechanical, and let agents iterate until they pass. No negotiation. No review boards. Pass or fail.

Code is cheap — agents generate it in minutes. Contracts are expensive — they encode hard-won understanding of what the system actually needs to do. Pact makes that inversion explicit.

"When a module fails in production, the response isn't 'debug the implementation.' It's: add a test that reproduces the failure, flush the implementation, and let an agent rebuild it. The contract got stricter. The next implementation can't have that bug."

Contracts accumulate the scar tissue of every production incident. They become the real engineering artifact.

Ten phases. All mechanical gates.

No human judgment in the loop at verification time. The pipeline either passes or it doesn't.

1-2
Interview
& Shape
3
Decompose
4-6
Contract
Test & Validate
7-8
Implement
& Integrate
8.5
Arbiter
Gate
9
Polish &
Certify

Built for the age of AI-generated code

When agents write the code, the contracts, tests, and verification become the real product.

🏗️

Contract-First Decomposition

Tasks decompose into 2-7 components. Each gets a typed interface contract and executable tests before any implementation begins.

Parallel Implementation

Independent components implement concurrently. No agent waits on another. Semaphore-limited concurrency keeps costs predictable.

🏁

Competitive Agents

N agents race on the same component. Best implementation wins — scored by test pass rate and execution time. The contract is the judge.

🌍

Python, TypeScript & JavaScript

Generate contracts, stubs, and implementations in Python, TypeScript, or plain JavaScript ES6 modules with JSDoc. Vitest for TS/JS, pytest for Python.

🔌

Multi-Provider

Route different roles to different LLMs. Opus for architecture, Sonnet for tests, GPT-4o for implementation. Anthropic, OpenAI, and Gemini supported.

💰

Budget-Aware

Per-project spend tracking with content-aware token estimation. Set a budget, let agents work. Multi-window caps prevent runaway costs.

📋

Plan-Only Mode

Stop after contracts and tests. Review the architecture. Then target specific components with pact build. Full control over what gets built and when.

🔍

Spec-Compliance Audit

Run pact audit after implementation to verify every requirement in your spec is covered. Get a gap report showing what's covered, partial, or missing.

🚨

Dysmemic Pressure Detection

The pipeline monitors its own coordination health. Detects the $50-planning-zero-output pattern, cascade failures, and budget stalls. Proposes remedies — you decide.

🔧

User-Controlled Remedies

When health degrades, Pact pauses and proposes fixes via FIFO directive. No silent config changes. The system never reduces its own degrees of freedom without asking.

🌀

Wavefront Scheduling

Dependency-driven fan-out. Each component advances through its own phase pipeline as soon as deps are satisfied. No phase-locked waiting.

🧠

Prompt Caching

Static prompt prefixes cached across API calls. 50-70% input token savings. Cache hit rates tracked in budget metrics. Research results persisted and reused across phases.

🎯

Hidden Acceptance Criteria

Goodhart tests: adversarial hidden tests the agent never sees. Catches hardcoded returns, missing validation, and invariants that hold only for visible inputs. Graduated-disclosure remediation on failure.

🛡️

Drift Detection

SHA256 baselines for contracts, tests, and implementations. Detects when artifacts change without version bumps. Staleness tracking classifies components as fresh, aging, or stale.

💡

Retrospective Learning

Post-run analysis: cost distribution, failure patterns, largest test suites, actionable lessons. Each run gets smarter from the last.

🛑

Contract Quality Gates

Anti-cliche enforcement flags vague contract language. Typed side-effect declarations. Optional performance budgets with p95 latency and Big-O constraints.

📐

Canonical Types with Validators

Contracts define data structures with domain-specific validators — range, regex, length, custom rules. Tests verify acceptance and rejection. Implementations render as Pydantic models, Zod schemas, or validated constructors. Not every field needs a validator — only those with domain semantics worth encoding.

🔄

Resume & Error Classification

Transient errors retry with backoff. Systemic failures pause with actionable recommendations. pact resume recovers from any failure without manual state editing.

🧩

Processing Register

Establishes the cognitive mode (rigorous-analytical, exploratory-generative, etc.) before any domain content. Contracts carry the register. Handoff protocol primes it. Health system monitors drift.

🎯

North-Star Validation

Checks that composed contracts actually fulfill the original task. Extracts action verbs from your spec and verifies coverage. Catches "all tests pass but the system can't do anything."

📤

Handoff Brief Inspector

pact handoff renders and validates what each agent actually sees. Check context fences, primer ordering, token budgets, and dependency coverage. Debug coordination at the prompt level.

MCP Server

Built-in MCP server for Claude Code integration. Inspect status, validate contracts, check budgets, and resume runs — all from your editor. 7 tools, 5 resources, stdio transport. pip install pact-agents[mcp]

🧪

Smoke Test Generation

Mechanical smoke tests from AST analysis — no LLM required. pact adopt extracts every public module-level function signature and generates import + callable checks in tests/smoke/. Filters out methods, private functions, and nested functions.

🏗

Architectural Assessment

Mechanical codebase analysis for structural friction — no LLM required. pact assess detects hub dependencies, shallow modules, tight coupling (mutual imports + SCCs), scattered logic, and test coverage gaps. Uses Python ast and Tarjan's algorithm. Point it at any Python directory — no project setup needed.

📂

Fully Visible Projects

All project knowledge lives in the project tree — contracts, source, tests, decomposition, Goodhart tests, standards, learnings. When a teammate clones the repo, they see everything. .pact/ contains only ephemeral per-run state that gets regenerated.

🔎

Tool Index

Optional enrichment from ctags (symbol index), cscope (call graph), tree-sitter (full CST, error-tolerant, cross-language), and kindex (knowledge graph). Agents get richer context about class hierarchies, callers, and existing project knowledge. All tools optional — graceful degradation. pip install pact-agents[analysis]

Benchmarked on ICPC World Finals

5 problems, 212 test cases, Claude Opus 4.6. Pact scores 212/212 (100%) on problems where single-agent Claude Code tops out at 79% (single-shot) or 92% (iterative with 5 retries).

The hardest problem — Trailing Digits (2020 World Finals) — requires O(log n) number theory. Claude Code gets 31/47 even with full test feedback and 5 iterations. Pact's contract pipeline forces the math-first approach and solves it on the first attempt.

Full Results

Condition Pass Rate Cost
Claude Code (single-shot) 79% $0.60
Claude Code (iterative, 5x) 92% $1.26
Pact 100% ~$13

5 ICPC World Finals problems · 212 test cases · Claude Opus 4.6

One piece of a larger stack

Pact is the contract-first build system. It integrates with four companion tools — all optional, all additive.

Constrain seeds decomposition with policies and component maps. Arbiter gates deployments with blast radius analysis and trust scoring. Ledger injects field-level audit assertions into test suites. Sentinel watches production, attributes errors via PACT keys, and pushes tightened contracts back.

Every implementation emits structured events with classification metadata. The access_graph.json artifact captures what each component reads, writes, and owns — consumed by Arbiter at phase 8.5 before code ships.

Constrain // policies, component maps, trust
Pact decomposes, contracts, builds
access_graph.json emitted
Arbiter // blast radius + trust gate
Ledger // field-level audit assertions
Certification artifact produced
Sentinel // production monitoring + contract tightening

Up and running in 60 seconds

Python 3.12+, two dependencies. pip install pact-agents and go. Generates Python, TypeScript, or JavaScript.

Describe your task in task.md, set your standards in sops.md, and let Pact decompose, contract, and build.

View on GitHub

# Install from PyPI
pip install pact-agents

# Or with all LLM backends
pip install pact-agents[all-backends]

# With code analysis tools (tree-sitter)
pip install pact-agents[analysis]

# Create a project
pact init my-project
# Edit my-project/task.md with your task
# Edit my-project/sops.md with your standards

# Run the pipeline
pact run my-project

# Or go step by step
pact run my-project --plan-only
pact components my-project
pact build my-project auth_module

Built on the ideas in Beyond Code

Pact is one of three systems built to test the ideas in Beyond Code: Context, Constraints, and the New Craft of Software. The book covers the coordination, verification, and specification problems that motivated Pact's design.

Read the Book

From contracts to running services: Baton

Pact produces components with provable interfaces. Baton wires them into a running service topology with mock collapse, A/B routing, health monitoring, and self-healing.

Start fully mocked, gradually swap to live implementations, canary new versions with weighted routing, and let the custodian auto-recover failures. Together, Pact and Baton cover the full lifecycle from architecture to production.

Learn about Baton

"Pact builds the pieces with typed contracts. Baton runs them as a circuit — pre-wired topology, smart adapters, mock collapse, and self-healing. Architecture decisions stay enforced from design through production."

Circuit-first orchestration for contract-first components.