Skip to content

Sentry PR Review Friction

This study measures where code review friction concentrates in getsentry/sentry and identifies which Architecture Decision Records (ADRs) could reduce repeated discussion cycles. Unlike a surface-level metric dump, we go beyond aggregate numbers: we read actual comment threads from high-friction PRs, classify recurring discussion themes with evidence, and propose ADRs grounded in real examples.

ParameterValue
Repositorygetsentry/sentry
Window90 days (ending April 2026)
Merged PRs analyzed500
Closed-unmerged PRs analyzed500
Open PRs sampled251
Deep comment analysis60 high-friction PRs (50 merged + 10 closed-unmerged)
Total comments analyzed965 (604 non-bot)
  1. Median time-to-merge is 4.98 hours, but P90 reaches 70.54 hours — a 14x multiplier. Large PRs (≥10 files or ≥400 churn) have a median TTM of 22.52 hours vs 1.66 hours for tiny PRs.

  2. PR size is the strongest friction predictor. Large PRs hit the high-friction quartile 57.4% of the time. Tiny PRs hit it only 9.8% of the time. Feature PRs (feat) have the highest friction rate at 38.6%, nearly double fix PRs (17.6%).

  3. The top 9 discussion themes in high-friction PRs, identified from 604 non-bot comments across 60 PRs:

    • API design and defaults — 38.3% of high-friction PRs
    • Test evidence and coverage — 38.3%
    • Component patterns and styling — 35.0%
    • State management and data flow — 35.0%
    • Code documentation — 31.7%
    • Type safety and error handling — 30.0%
    • Follow-up and scope creep — 25.0%
    • Security and permissions — 20.0%
    • Naming and consistency — 11.7%
  4. Automated reviewers find real bugs — but the same bug categories repeat across PRs. Bot reviewers (sentry[bot], sentry-warden[bot], cursor[bot]) account for 23.7% of substantive review comments and appear in 68.3% of high-friction PRs. Reading the actual findings reveals at least 10 recurring patterns (missing DoesNotExist handlers, .filter().first() with unreachable except, direct dict access on API responses, option key mismatches, companion list misses, etc.) that should be promoted from expensive agentic review to cheap deterministic checks (Ruff/Semgrep rules, mypy strict mode, typed registries). One PR (#111522) had the same pattern flagged 4 times in one review pass — exactly the case where a single lint rule pays off forever.

  5. Abandoned PRs signal unresolved decision ambiguity. 12 closed-unmerged PRs (2.4%) had ≥10 discussion items. These abandoned PRs had 2x the median TTM and 2.5x the median review events compared to merged high-friction PRs.

  6. 92 open PRs are stale 14+ days, with 33 stale 30+ days. The most-discussed stale PR has 55 review events and has been open for over 760 hours.

All data and scripts are published for audit:

Terminal window
# Reproduce the full pipeline
python analyze_sentry_prs.py collect --repo getsentry/sentry --days 90 --limit 500
python analyze_sentry_prs.py analyze
python analyze_sentry_prs.py report