Back to examples
Technical brief
Basis example

Technical Brief Example: Evaluating Coding Agents on Real Repository Tasks

A real 38-page Basis brief on repository-scale coding-agent evaluation, presented as a curated public proof page with the actual manuscript output front and center.

This brief shows Basis holding a long technical argument together across a full manuscript with cover, contents, section structure, bibliography, and a readable public edition.

Document
Technical brief
Length
38 pages
Front matter
Cover · contents · main sections
Public edition
Full viewer with cleaned references
Introduction page from the Basis coding-agent evaluation brief.
Document facts
Objective Design a repository-scale evaluation methodology for coding agents that focuses on executable contracts, gated scoring, failure modes, cost controls, and an internal pilot design that an engineering team could actually run.
Outputs PDF · TeX · bibliography
Viewer Full public edition of the real manuscript with cleaned front matter and internal artifact references removed.
Run brief

What this run had to deliver.

Basis had to satisfy a concrete objective, keep the assumptions explicit, and leave behind artifacts a human could inspect and continue.

Objective

Technical Brief Example: Evaluating Coding Agents on Real Repository Tasks

Design a repository-scale evaluation methodology for coding agents that focuses on executable contracts, gated scoring, failure modes, cost controls, and an internal pilot design that an engineering team could actually run.

Scope

What had to be covered

  • Model each evaluation task as an executable contract with pinned repo state, explicit checks, and a controlled execution environment.
  • Separate hard pass-fail gates from diagnostic scoring so failures can be interpreted operationally.
  • Make auditability, cost control, and reviewer handoff part of the methodology instead of afterthoughts.
Artifacts

What persisted after the run

  • Public manuscript PDF
  • LaTeX source manuscript
  • Bibliography and references
  • Figures and diagrams used in the brief
  • Curated public viewer edition
Inside the output

What the document actually says.

Core move Treats repository tasks as executable contracts instead of prompt-only exercises.
Scoring model Uses binary gates plus diagnostic dimensions so reviewers can reason about failure modes.
Handoff surface Persists as a manuscript, bibliography, integrity checks, and a bundled project export.
What the manuscript actually does

Representative summary

The brief frames each repository task as an executable contract: a frozen repository snapshot, a task specification aligned with real issues, an allowed-tools policy, a resource budget, and explicit acceptance checks.

From there it moves into operating detail rather than staying abstract. The manuscript spells out scoring gates, failure-mode taxonomy, instrumentation, reproducibility hooks, and an internal pilot design that could be used for a real evaluation cycle.

Actual PDF

Read the output directly.

Full public edition of the real manuscript with cleaned front matter and internal artifact references removed.

Public manuscript PDF

Embedded here for quick review.

Human review

What was checked before this became public

  • The public edition keeps the full manuscript shape, including cover and table of contents.
  • Placeholder author metadata was replaced and internal artifact-path references were removed from the public PDF.
  • Public claims stay tied to the output shape and the visible quality of the manuscript itself.
Source notes

Where the example comes from

  • Prepared from an audited Basis manuscript reviewed on March 8, 2026.
  • The public edition comes directly from the reviewed run output with light cleanup for public release.
Invite-only

Use this as the bar for your own run.

Start with a concrete question, explicit constraints, and the artifact package you expect to review at the end.