30. April 2026

Casual-Review: A CLI for Team Code Review Without the Server

Most code review tools want to be a service. The interesting question is what falls out when you refuse to run one — and use git itself as the sync layer instead.

I’ve spent the last few weeks building casual-review — cr on the command line — a single static binary that delivers rustc-quality diagnostics across Rust, Python, TypeScript, TSX, and Java. It’s diff-aware by default, runs equally on a developer workstation and in CI, and is designed for both human and agent (LLM) consumers.

The newest pieces, shipped this week, turn cr from a linter into a collaborative review tool: findings and review comments persist as git notes, sync via the same git fetch / git push you already use, and surface in VS Code, JetBrains, and Zed through thin extensions that shell out to the CLI. Still no server. Still one binary.

This post is a tour of what it does, the design choices behind it, and where it fits in the broader argument I’ve been making that code review needs to move out of the pull request.

The Shape of the Tool

Most code review tools split into two pieces: a client that scans your code and a server that tracks issues over time, assigns them to people, and tries to remember what you’ve already triaged. That architecture made sense when review was a quarterly compliance event. It makes less sense when review is something a team does every few minutes, increasingly with agents in the loop.

casual-review deliberately stays a single CLI. No backend. No database. No “issue lifecycle.” The kind of feedback you want before a commit, in line with your workflow, without infrastructure to run.

cr check                          # working-tree diff (default)
cr check --staged                 # only staged changes
cr check --all                    # lint all lines in changed files
cr check --repo                   # entire repository
cr check --format json            # JSON output for agents
cr explain <rule-id>              # view rule documentation

cr publish                        # persist findings as git notes for HEAD
cr show                           # read findings back for a commit
cr ack <id> [msg]                 # dismiss a finding (threaded under parent)
cr fetch / cr push                # sync findings + comments via git remotes

cr comment add ...                # author a review comment (line / file / commit)
cr comment list [--include-ancestors]
cr comment reply <id>             # threaded replies
cr comment resolve <id>

Exit codes follow the Unix convention: 0 clean, 1 findings, 2 tool failure. That’s the entire surface area of the tool when used in CI.

Diff-Aware by Default

The default cr check lints only the code that changed in your working tree. Not the file. Not the function around the change. The lines you actually touched.

This sounds like a small detail. It is the most important design choice in the tool.

A linter that fires on the entire file punishes you for editing legacy code. The signal-to-noise ratio collapses the moment you open payment_handler.py for a one-line fix, because now you “own” forty pre-existing TODOs and three suspiciously broad except blocks that have been there since 2019. Most teams respond to this the way humans respond to any noisy system: they stop reading the output. Once that happens, the linter is decorative.

Diff-aware scoping inverts the contract. The findings the tool surfaces are findings you introduced. If cr check tells you the function you just added has cognitive complexity 22, that’s a fact about your work, not the codebase’s history. It’s actionable in a way the file-level version is not. And it scales: a team can adopt cr against an established repository without a debt-cleanup project as a prerequisite.

Flags exist to widen the scope when you need them. --all lints every line in the files that changed. --repo lints the whole tree — useful for an initial audit or a scheduled state-of-the-codebase report. The defaults assume the most common case, which is “I’m about to commit.”

Git Notes as the Substrate

This is the part that changed the most this week, and it’s the design choice the rest of the tool now hangs off of.

Findings and review comments don’t live in a database. They live as git notes, attached to the commits they describe, on dedicated refs:

refs/notes/casual-review/findings — rule output, one entry per finding, threaded dismissals via a parent field
refs/notes/casual-review/discuss — user-authored comments, threaded replies, resolutions

Git already knows how to sync arbitrary refs between repos. cr fetch and cr push are thin wrappers around git fetch / git push for those two notes refs. The “team” part of team code review is therefore literally the same mechanism that already syncs your branches. There’s no separate auth, no separate identity model, no separate hosting story — if a developer can push commits, they can push review state.

Each comment carries a stable SHA-256-derived ID, plus an anchor_text_sha of the source it’s attached to. That second hash is the staleness check: when the underlying lines change, the SHA stops matching and the comment is marked [stale] rather than silently re-anchoring to whatever happens to be there now. Comments can anchor at three granularities: a line range, a file, or a whole commit.

The diff-aware listing extends naturally too. cr comment list --include-ancestors walks git notes list, projects each comment forward from its origin commit, and shows them on HEAD with [from <abbrev>] markers. So when you check out a branch, you see not just the comments authored against the current tip but everything threaded into the history leading up to it — without anyone running a server that “tracks” them.

This is the move I think matters most. Code review tools have spent a decade reinventing the parts of git they refused to use directly: distributed sync, content-addressable storage, history-aware projection. Notes are right there in the tool everyone already runs. They’re just unloved.

The Team Angle

The team workflow this enables looks like this:

flowchart LR
    subgraph Local["Developer A"]
        A1[edit code] --> A2["cr check
(pre-commit)"]
        A2 -->|clean| A3[commit + push]
    end
    subgraph CI["Shared CI"]
        A3 --> CI1[cr check --staged]
        CI1 -->|findings| CI2[cr publish + cr push]
    end
    subgraph Reviewer["Developer B / agent"]
        CI2 -.->|git fetch + cr fetch| B1[cr show / cr comment list]
        B1 --> B2[cr comment add / reply / resolve]
        B2 --> B3[cr push]
    end
    B3 -.->|git fetch + cr fetch| A1

Same binary on every machine. Same rules. No server keeping score. The dashboard you’d otherwise pay for is replaced by git blame, cr show, and the rule that the build fails on findings. The discussion thread you’d otherwise pay for is replaced by cr comment list over the notes refs.

This is the same trick rustfmt plays for formatting. rustfmt --check in CI plus rustfmt on save locally eliminates a whole category of review comments without anyone having to host a service. cr extends that pattern from “is the code formatted” to “are there fifteen specific things wrong with this diff, here’s the conversation we’ve already had about them, and here’s what’s been resolved.”

The Rule Set

Fifteen rules at the moment, deliberately small. The bias is toward things that are unambiguously wrong (or unambiguously worth a second look) rather than stylistic opinions:

Universal rules — apply across all supported languages:

Parse errors (the file doesn’t tree-sit cleanly)
TODO / FIXME markers introduced by the diff
Trailing whitespace on changed lines
Functions over 40 lines
Cognitive complexity above a configurable threshold
Debug print statements (println!, print(, console.log, System.out.println)
Silent error handling (catch blocks that discard the error without logging or rethrowing)
Disabled tests (#[ignore], @pytest.mark.skip, it.skip, @Disabled)
Assertion-free tests
Hardcoded secrets (heuristic match against common credential shapes)

Language-specific rules:

unwrap usage in Rust outside test code
Explicit any types in TypeScript
Bare except: clauses in Python

Diff-aware structural rule:

api-surface-change — flags pub / export / public Java types / top-level def items added or removed in the diff. This is the rule a generic linter literally cannot write, because it requires knowing what changed relative to the base — not just what the file currently contains.

The api-surface-change rule is the one I’m proudest of. It catches the class of issue where an agent (or a hurried human) renames or removes an exported symbol without understanding that something downstream of the repo depended on it. The diff makes the change look local. The rule makes it visible as the cross-cutting event it actually is.

IDE Extensions: VS Code, JetBrains, Zed

The CLI is the source of truth. The IDE extensions are deliberately thin shells over it — no daemon, no LSP server, no embedded copy of the rule engine. Every action invokes cr once and parses its JSON output. That keeps the editors honest about what they show: if cr on the command line says something, the IDE says the same thing, because it ran the same binary.

Editor	Mechanism	Notes
VS Code	TypeScript extension	Gutter decorations on commented lines; hover shows the thread with inline Reply / Resolve command links; status bar shows open / stale counts. Commands: Add, Reply, Resolve, Sync, Fetch, Push, Refresh, Show Stale.
JetBrains	Kotlin / Gradle plugin (IntelliJ Platform 2.6.0, since-build 242)	Project-scoped service refreshes via the message bus; gutter highlighters via `MarkupModel.addLineHighlighter`; right-anchored tool window listing comments for the active file with an inline reply composer (Cmd/Ctrl+Enter to send); persistent settings panel for `binPath`, `includeAncestors`, `remote`.
Zed	WASM extension via `zed_extension_api` 0.7.0	Slash commands for the Assistant: `/cr-help`, `/cr-list`, `/cr-add`, `/cr-reply`, `/cr-resolve`, `/cr-sync`, `/cr-status`.

Two implementation details worth flagging.

The JetBrains plugin’s reply correctness fix. Replies and resolutions to comments that were projected via --include-ancestors need --commit <originCommit> so the operation lands on the parent’s note rather than HEAD’s. Without this, you get “Comment X not found on commit HEAD” — the comment exists, but on a different commit’s notes. The plugin tracks origin_commit per projected comment and threads it through every write. This kind of bug is the price of using git notes the way they want to be used; the alternative (a single global review database) wouldn’t have the bug, but it also wouldn’t have any of the other properties that make this design work.

The Zed extension’s WASM constraint. Zed runs extensions in a WASM sandbox without subprocess access — the extension cannot spawn cr directly. So instead of executing commands, the slash commands emit a ready-to-run shell invocation that pairs with Zed’s AI assistant: /cr-add produces an exact cr comment add ... line with the body POSIX-quoted. The user (or the assistant) runs it. The roadmap for live integration is shipping a separate cr-mcp binary and declaring it as a Zed context_server — which is the right shape for that editor’s model and not the kind of thing to hack around.

All three extensions build in CI on every push and ship as artifacts on every release: .vsix for VS Code, .zip for JetBrains, .wasm for Zed. The Rust binary release is independent of the extension builds — a slow gradlew buildPlugin cannot block the Homebrew formula update.

Speed Is a Feature, Not a Vanity Metric

The current numbers: ~280k lines of code per second on a single thread for parse plus rules, ~550k LOC/sec parallel, with cold startup around 6ms.

That’s not bench-bragging. It’s the threshold at which the tool can run in a pre-commit hook without anyone disabling it. Pre-commit hooks live or die on perceived latency. A 200ms hook is invisible. A 2-second hook gets a --no-verify shortcut three days after installation. Once one person does that, the team contract is broken.

Tree-sitter does most of the heavy lifting on parsing. rayon handles file-level parallelism. git2 resolves diffs without shelling out to git per file. ariadne renders the diagnostics. Nothing exotic — just the parts of the Rust ecosystem that are good at being fast by default.

Built for Agents Too

The --format json flag emits one diagnostic per line as a JSON object. That’s not a gesture toward “API support.” It’s the format that AI coding agents actually consume well: streaming, parseable line-by-line, no top-level array to wait on. The schema is stable across patch releases within a CalVer minor.

The intended agent loop looks like:

cr check --format json | agent-review-tool

The agent reads each finding, decides whether it’s worth acting on, and either edits the code, dismisses the finding with cr ack <id>, or — and this is the part the comments substrate unlocks — leaves a cr comment add explaining its reasoning to the next reviewer (human or agent). The conversation persists on git notes, syncs with the next push, and is visible to whoever opens the file in their editor.

The same binary that runs in your pre-commit hook is the one that feeds the agent. There’s no separate “API surface” to maintain — the JSON shape is the contract, and cr explain <rule-id> is the documentation the agent reads when it wants to know what a finding means.

This pairs naturally with the pre-PR agent peer review pattern — a reviewer-agent that runs after the author-agent, before any human is paged. cr gives that reviewer-agent a structured set of facts to react to instead of asking it to re-discover the same fifteen findings from scratch, plus a place to write its conclusions that doesn’t disappear when the conversation ends.

The repository’s AGENTS.md (with a Claude Code-specific CLAUDE.md companion) documents the exact prompts and patterns for wiring this into an agent harness.

What It Is Not

A few things casual-review deliberately is not, because saying yes to all of them would turn it into the server-backed thing it’s trying to replace:

Not a security scanner. The hardcoded-secrets rule is a heuristic, not a SAST tool. If you need CodeQL or Semgrep semantics, run those.
Not a formatter. Use rustfmt, prettier, black, gofmt. Formatting is a solved problem; the solutions are good; do not add another opinion.
Not a coverage tool. Coverage requires a test runner; cr doesn’t run your code.
Not a hosted review service. There is no web UI, no inbox, no notification system. Findings and comments live on git notes; the IDE is the inbox; git fetch is the notification. If you need GitHub PR semantics — required reviewers, blocking checks tied to identity, audit logs — keep using GitHub. cr is the layer underneath.

Status

Still early, but past the “does it work end-to-end” question. As of this week:

Phase 1 + 2 — rules engine, diff-aware scoping, JSON renderer, all 15 built-in rules. Done.
Phase 3 — findings persistence on git notes, cr publish / show / ack, threaded dismissals, cr fetch / push over refs/notes/casual-review/findings. Done.
Phase 4 — collaborative comments (cr comment add / list / reply / resolve / reanchor), staleness via SHA-256 anchors, ancestor projection. Done.
Editor extensions — VS Code, JetBrains, Zed all building in CI and shipping per release. Done.

What’s still open: hardening against very large repos, more language coverage, and figuring out what a cr-mcp binary should look like for editors (and agents) that prefer a long-lived context server over per-call CLI invocations.

CalVer (YYYY.M.D) is the versioning scheme, which makes “is this build current?” a one-glance question. Installation is via Homebrew, Cargo, or a cargo install --git from source. Dual-licensed MIT or Apache-2.0.

If you try it and a rule fires on something it shouldn’t, or a rule you’d expect to fire stays quiet, open an issue. The rule set is small precisely so that adding a new one is a deliberate decision rather than a backlog grooming exercise.

Why This, Now

I’ve been arguing for a while that the pull request as the unit of review is straining under the weight of agent-generated code, and that the same graph that helps agents write code is what reviewers need too. casual-review started as the smallest, lowest-leverage piece of that argument: a linter that catches the boring fifteen things before they reach a reviewer at all.

The phase 3 and 4 work made it something more interesting — a demonstration that the whole of code review (findings, dismissals, comments, threads, sync, multi-IDE surfaces) can be built without any of the SaaS infrastructure the category insists on. Git already does the hard parts. Notes are the right abstraction for ephemeral, append-only annotations on commits. Editors only need a few hundred lines of glue each to participate. The “review server” is, it turns out, mostly accidental complexity.

It still doesn’t try to solve graph-level review. It doesn’t try to be a continuous state monitor. The bigger pieces — the graph, the agent-on-agent review pipeline, the spec-as-review-unit move — are downstream of getting this part right. If cr check isn’t fast enough to live in a pre-commit hook, and cr comment isn’t simple enough to live on git notes, nothing further up the stack matters; the team will route around it. So I started here.