A real 38-page Basis brief on repository-scale coding-agent evaluation, presented as a curated public proof page with the actual manuscript output front and center.
This brief shows Basis holding a long technical argument together across a full manuscript with cover, contents, section structure, bibliography, and a readable public edition.
Basis had to satisfy a concrete objective, keep the assumptions explicit, and leave behind artifacts a human could inspect and continue.
Design a repository-scale evaluation methodology for coding agents that focuses on executable contracts, gated scoring, failure modes, cost controls, and an internal pilot design that an engineering team could actually run.
The brief frames each repository task as an executable contract: a frozen repository snapshot, a task specification aligned with real issues, an allowed-tools policy, a resource budget, and explicit acceptance checks.
From there it moves into operating detail rather than staying abstract. The manuscript spells out scoring gates, failure-mode taxonomy, instrumentation, reproducibility hooks, and an internal pilot design that could be used for a real evaluation cycle.
Full public edition of the real manuscript with cleaned front matter and internal artifact references removed.
Start with a concrete question, explicit constraints, and the artifact package you expect to review at the end.