Sentry PR Review Friction
Abstract
Section titled “Abstract”This study measures where code review friction concentrates in getsentry/sentry and identifies which Architecture Decision Records (ADRs) could reduce repeated discussion cycles. Unlike a surface-level metric dump, we go beyond aggregate numbers: we read actual comment threads from high-friction PRs, classify recurring discussion themes with evidence, and propose ADRs grounded in real examples.
| Parameter | Value |
|---|---|
| Repository | getsentry/sentry |
| Window | 90 days (ending April 2026) |
| Merged PRs analyzed | 500 |
| Closed-unmerged PRs analyzed | 500 |
| Open PRs sampled | 251 |
| Deep comment analysis | 60 high-friction PRs (50 merged + 10 closed-unmerged) |
| Total comments analyzed | 965 (604 non-bot) |
Key Findings
Section titled “Key Findings”-
Median time-to-merge is 4.98 hours, but P90 reaches 70.54 hours — a 14x multiplier. Large PRs (≥10 files or ≥400 churn) have a median TTM of 22.52 hours vs 1.66 hours for tiny PRs.
-
PR size is the strongest friction predictor. Large PRs hit the high-friction quartile 57.4% of the time. Tiny PRs hit it only 9.8% of the time. Feature PRs (
feat) have the highest friction rate at 38.6%, nearly double fix PRs (17.6%). -
The top 9 discussion themes in high-friction PRs, identified from 604 non-bot comments across 60 PRs:
- API design and defaults — 38.3% of high-friction PRs
- Test evidence and coverage — 38.3%
- Component patterns and styling — 35.0%
- State management and data flow — 35.0%
- Code documentation — 31.7%
- Type safety and error handling — 30.0%
- Follow-up and scope creep — 25.0%
- Security and permissions — 20.0%
- Naming and consistency — 11.7%
-
Automated reviewers find real bugs — but the same bug categories repeat across PRs. Bot reviewers (
sentry[bot],sentry-warden[bot],cursor[bot]) account for 23.7% of substantive review comments and appear in 68.3% of high-friction PRs. Reading the actual findings reveals at least 10 recurring patterns (missingDoesNotExisthandlers,.filter().first()with unreachable except, direct dict access on API responses, option key mismatches, companion list misses, etc.) that should be promoted from expensive agentic review to cheap deterministic checks (Ruff/Semgrep rules, mypy strict mode, typed registries). One PR (#111522) had the same pattern flagged 4 times in one review pass — exactly the case where a single lint rule pays off forever. -
Abandoned PRs signal unresolved decision ambiguity. 12 closed-unmerged PRs (2.4%) had ≥10 discussion items. These abandoned PRs had 2x the median TTM and 2.5x the median review events compared to merged high-friction PRs.
-
92 open PRs are stale 14+ days, with 33 stale 30+ days. The most-discussed stale PR has 55 review events and has been open for over 760 hours.
Study Structure
Section titled “Study Structure”- Methodology — Data collection, tools, sample sizes, and limitations
- Baseline Metrics — Aggregate metrics, percentiles, and size segmentation
- Friction Map — Domain and area friction breakdown
- Discussion Themes — Evidence-backed theme analysis from real comment threads
- Automated Review Friction — Bot reviewers as a first-class friction source
- Abandoned PRs — Patterns in closed-unmerged and stale PRs
- ADR Proposals — Detailed proposals grounded in evidence
Reproducibility
Section titled “Reproducibility”All data and scripts are published for audit:
- Study folder:
studies/sentry-pr-review-friction - Method script:
analyze_sentry_prs.py - Data artifacts:
output/
# Reproduce the full pipelinepython analyze_sentry_prs.py collect --repo getsentry/sentry --days 90 --limit 500python analyze_sentry_prs.py analyzepython analyze_sentry_prs.py report