Skip to content

Automated Review Friction

Sentry runs three automated reviewers on its PRs: sentry[bot] (bug prediction), sentry-warden[bot] (correctness/security checking), and cursor[bot] (Cursor’s agentic review). Combined, they post 23.7% of all substantive review comments in our high-friction sample, and 68.3% of high-friction PRs have bot review activity.

These bots are not noise. They find real bugs — Pydantic type errors, missing exception handlers, key mismatches, security holes. Every finding is something a developer must read, evaluate, and address.

But here is the central insight of this section:

The same bug categories repeat across different PRs. When the same agentic-review pattern fires on PR after PR, that pattern should be promoted from “expensive LLM review on every commit” to “cheap deterministic check on every commit.”

Agentic review is the right tool for finding a recurring pattern the first few times. It is the wrong tool for enforcing it once the pattern is known.

MetricValue
Total substantive bot review comments139
Total substantive human review comments448
Bot share of substantive review23.7%
PRs with bot review activity41 / 60
Share of high-friction PRs with bots68.3%
Average bot comment length1,565 characters
BotSubstantive CommentsRole
sentry[bot]81Internal bug prediction reviewer
sentry-warden[bot]50Internal correctness/security checker
cursor[bot]4Cursor agentic review (most filtered as templates)
github-advanced-security[bot]3GitHub security advisor
getsentry-bot1Internal automation

sentry[bot] and sentry-warden[bot] together account for 94% of substantive bot reviews.

When we read the actual bot findings, they cluster into a small number of recurring categories. The clusters below were identified by reading the 139 substantive bot comments in our sample. Each cluster includes the pattern, examples, and the kind of deterministic check that would replace the agentic review for that pattern.

Pattern 1: Missing exception handling on model fetches

Section titled “Pattern 1: Missing exception handling on model fetches”

QuerySubscription.DoesNotExist not handled in get_metric_issue_contextMetricIssueContext.from_group_event() calls QuerySubscription.objects.get(id=...) without handling DoesNotExist. If the subscription is deleted between alert creation and notification rendering, this will crash. — sentry-warden[bot] on #111674

get_from_cache raises ValueError because DetectorGroup has no cache_fields defined — The base BaseManager has empty cache_fields by default, so get_from_cache will raise. — sentry-warden[bot] on #111714

If project_id is None in trigger_coding_agent_launch, Project.objects.get_from_cache fails silentlysentry[bot] on #111691

Replace with: A custom Ruff/Semgrep rule that flags Model.objects.get(...) and Model.objects.get_from_cache(...) calls inside functions that don’t have a corresponding except Model.DoesNotExist: (or wrapping try). For get_from_cache, also check that the model defines cache_fields. This is a 30-line rule that catches the pattern forever.

Pattern 2: .filter().first() with unreachable except DoesNotExist

Section titled “Pattern 2: .filter().first() with unreachable except DoesNotExist”

AttributeError on None when Group is not found — The function uses .filter().first() which returns None when no record exists, but the except Group.DoesNotExist handler is unreachable because .filter().first() never raises exceptions. — sentry-warden[bot] on #111724

Replace with: A trivial AST lint: if a try block contains .filter(...).first() and the except clause catches *.DoesNotExist, flag as unreachable. This pattern is mechanical — no LLM needed.

Pattern 3: Direct dict access on external API responses

Section titled “Pattern 3: Direct dict access on external API responses”

KeyError when GitHub API response lacks 'installations' key — Line 908 uses direct dict access installed_orgs["installations"] on the GitHub API response. If the API returns an error or malformed response, this will raise KeyError. — sentry-warden[bot] on #111728

Replace with: An ADR that requires external API responses to be parsed through TypedDicts or Pydantic models with explicit field validation. A lint rule can flag direct ["key"] access on variables that come from requests.get(...).json() or gh_client.* calls.

Pattern 4: Type system misuse in serializers

Section titled “Pattern 4: Type system misuse in serializers”

TypeError when saving defaultCodingAgent: str | None union type is not callable — The defaultCodingAgent option tuple uses str | None as the type parameter. The code calls type_(data[key]) to convert the value. However, str | None is a types.UnionType which cannot be called as a function. — sentry-warden[bot] on #111697

Pydantic validation error when coding_agent is 'seer' with integration_id — The code creates a SeerAutomationHandoffConfiguration with target='seer'. However, the Pydantic model’s target field is Literal['cursor_background_agent', 'claude_code_agent', ...] which excludes 'seer'. — sentry-warden[bot] on #111697

Replace with: Stricter mypy/pyright configuration in CI, plus a custom check that the type annotation passed to option registration is a concrete callable type, not a UnionType. The Pydantic Literal mismatch is caught by mypy if strict mode is enabled on the relevant module.

Pattern 5: Key/identifier mismatch between writer and reader

Section titled “Pattern 5: Key/identifier mismatch between writer and reader”

Organization option key mismatch causes setting to have no effect — The new option registers with key sentry:default_automated_run_stopping_point but the code that consumes this value reads from key sentry:default_stopping_point. Any value set via the API will be stored but never read. — sentry-warden[bot] on #111697

Replace with: This is one of the most valuable bot findings — and one of the easiest to convert to a check. Sentry already has a registry for organization options. A test fixture can assert that every key written through OrganizationOption.objects.set_value(...) corresponds to a registered key, and every read site (get_value(...)) uses a key that exists in the registry. Bonus: extract option keys into a typed enum so the type system catches the typo at edit time.

Pattern 6: Non-exhaustive if/elif blocks causing UnboundLocalError

Section titled “Pattern 6: Non-exhaustive if/elif blocks causing UnboundLocalError”

The if/elif block checking the data type is not exhaustive. A new subclass of BaseMetricAlertNotificationData would cause an UnboundLocalError because metric_issue_context would not be assigned.sentry[bot] on #111674

Replace with: Use Python 3.10+ structural pattern matching with explicit case _: exhaustiveness, OR use mypy’s assert_never() pattern for tagged unions. Both are catchable by static analysis once the convention is in place.

The python-litestar platform is missing from several product onboarding arrays in platformCategories.tsx, which will prevent its onboarding documentation from loading for Logs, Metrics, Profiling, and Replay. — sentry[bot] on #111522

The python-litestar platform is missing from the Python platforms set in monitorQuickStartGuide.tsx, causing it to be miscategorized in the Crons monitor form. — sentry[bot] on #111522

In a single PR (#111522), the bot found the same kind of bug four separate times — adding a platform to one list while forgetting the other lists.

Replace with: Refactor the platform registry into a single source of truth that derives all the companion lists from a typed registry. This is a one-time refactor that eliminates an entire bug class. Alternatively, a CODEOWNERS-style consistency check: when a PR modifies platformCategories.tsx, verify that any new platform identifier appears in all the relevant arrays.

The circuit breaker for Sentry App webhooks uses a per-app slug for its metrics key, creating high-cardinality data because a metrics_key override is not provided.sentry[bot] on #111723

Replace with: A lint rule that flags metrics.incr(name, tags={"slug": ...}) (or similar high-cardinality patterns) without an explicit metrics_key override or an allowlist comment. This is a known anti-pattern with a known fix.

Pattern 9: State not reset in catch blocks

Section titled “Pattern 9: State not reset in catch blocks”

The form’s saving state is not reset in the catch block on submission failure, leaving the submit button permanently busy and preventing retries.sentry[bot] on #111490

Replace with: An ESLint rule (or React custom hook convention) that requires try/finally for any state setter inside an async submission handler — setSaving(true) in try, setSaving(false) in finally. The convention pre-empts the error path bug entirely.

Pattern 10: Truthiness checks removed → None saved as explicit value

Section titled “Pattern 10: Truthiness checks removed → None saved as explicit value”

The removal of the truthiness check for stopping_point causes None to be explicitly saved as a project option, overriding the intended default value of "code_changes".sentry[bot] on #111697

Replace with: An ADR (and a lint rule) that says option setters must reject None unless the option’s schema explicitly allows it. Combined with Pattern 4 (typed option registry), this becomes a static check.

PatternPRs in Sample with MatchBot(s) ReportingCost of Deterministic Check
Missing DoesNotExist handling3+warden, sentry-botLow — Ruff/Semgrep rule
.filter().first() + unreachable except1+wardenTrivial — AST lint
Direct dict access on API responses2+wardenMedium — TypedDict adoption + lint
UnionType not callable1wardenLow — mypy strict mode
Pydantic Literal mismatch1wardenTrivial — mypy strict mode
Option key mismatch (writer/reader)1wardenLow — typed option registry
Non-exhaustive if/elif1sentry-botMediumassert_never convention
Companion list miss (platform registry)1 (4 instances)sentry-botMedium — refactor to single source
High-cardinality metrics tag1sentry-botLow — lint rule
Form state not reset on error1sentry-botTrivial — ESLint rule
None saved as explicit value1sentry-botLow — schema validation

These 10 patterns alone account for the majority of substantive bot findings in our sample. Each has a deterministic-check replacement that costs vastly less than agentic review per-PR.

Agentic review costs roughly:

  • LLM inference: A 1,500-character bot review comment implies hundreds of thousands of tokens of input (the diff + context) and meaningful output. At current model pricing, this is on the order of tens of cents per finding.
  • Developer attention: 1,565-char findings take 30-60 seconds to read and evaluate. Multiplied across 139 findings in 60 PRs, that’s roughly 70-140 minutes of cumulative developer attention in our sample alone. Extrapolated to Sentry’s full PR volume, the time investment is substantial.
  • Latency: Agentic review runs after the PR is opened. A bug found post-open is more expensive than a bug caught at edit time by a type checker or lint rule.

A deterministic check costs:

  • CI runtime: Microseconds to milliseconds per file.
  • Developer attention at edit time: Lint errors surface in the IDE before the PR is even opened. Catching the bug at edit time is roughly 100x cheaper than catching it in PR review.
  • Zero per-PR LLM inference cost.

The economic and ergonomic case is clear: every bot finding that recurs is a candidate for promotion to a deterministic check.

Agentic review still has a legitimate role:

  1. First-time pattern discovery — finding novel bug shapes that don’t match any known rule yet
  2. Cross-file reasoning — bugs that span multiple files in ways that AST-level rules can’t easily capture
  3. Semantic correctness — “this code does the wrong thing” findings that require understanding intent
  4. Security review for novel attack surfaces — where an attacker’s mental model is needed

The goal is not to eliminate bot review. The goal is to continuously promote findings from agentic review to deterministic rules, so that the bot’s attention is freed up for the genuinely novel cases.

This insight reshapes ADR 6 — instead of a “triage protocol” for bot findings, the right ADR is a promotion protocol: a workflow for converting recurring bot findings into lint rules, type checks, or schema validations.

It also strengthens several other ADRs:

  • ADR 2 (API Contract Evolution) — many bot findings (option key mismatch, None propagation, Pydantic Literal mismatches) are catchable if the API contract is typed and registry-driven
  • ADR 3 (Test Evidence Matrix) — bot test findings (assertion mismatches, missing edge cases) become test scaffold templates
  • ADR 4 (Frontend Conventions) — the form-state-not-reset bot finding becomes a React convention, not an ad-hoc review comment
  • Sample bias: This analysis is based on the top 60 friction PRs, not all 500. Bot activity in low-friction PRs may differ.
  • Pattern counts are minimums: The “PRs in sample with match” column shows occurrences in our 60-PR deep sample. Across Sentry’s full PR volume, each pattern likely recurs much more frequently. A formal recurrence analysis would require pattern-tagging the bot findings systematically.
  • Substantive vs. noise filter: We exclude template comments (BUGBOT_REVIEW summaries, BACKEND_TEST_FAILURES) but count substantive line-level bot reviews.
  • Bots are evolving: Sentry’s internal bots (sentry[bot], sentry-warden[bot]) are relatively new. As more recurring patterns are promoted to deterministic checks, the bots’ findings should shift toward genuinely novel cases.