Fugu-MT 論文翻訳(概要): Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

論文の概要: Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

arxiv url: http://arxiv.org/abs/2605.30208v1
Date: Thu, 28 May 2026 16:44:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.550195
Title: Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency
Title（参考訳）: Metaでの低リスクコードレビューの自動化: RADAR、リスク校正、レビュー効率
Authors: Chris Adams, Arjun Singh Banga, Parveen Bansal, Souvik Bhattacharya, Rujin Cao, Pedro Canahuati, Nate Cook, Brian Ellis, Prabhakar Goyal, Gurinder Grewal, Tianyu He, Matt Labunka, Alex Manners, David Molnar, Ging Cee Ng, Vishal Parekh, Jiefu Pei, Frederic Sagnes, James Saindon, Will Shackleton, Sid Sidhu, Gursharan Singh, Karthik Chengayan Sridhar, Matt Steiner, Pratibha Udmalpet, Sean Xia, Stacey Yan, Audris Mockus, Peter Rigby, Nachiappan Nagappan,
Abstract要約: Metaでは、人為的な差分あたりのコード行数が前年比105.9%増加した。開発者毎の差額は51%増加し、エージェントAIが80%以上を占めた。我々は、キャリブレーションから影響まで、実現可能性から進展する3つの質問を行う。
参考スコア（独自算出の注目度）: 10.379265985245537
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year over year and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth. Meanwhile, the share of diffs receiving timely review has declined, exposing a widening gap between code supply and reviewer bandwidth. We ask three questions that progress from feasibility through calibration to impact: (1) can risk-stratified automation operate at scale across diverse organizations, (2) how does tuning the risk threshold affect the trade-off between automation yield and safety, and (3) to what extent does automated review reduce end-to-end latency for AI-generated changes? We deployed RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based Automated Code Review, and deterministic validation before landing qualifying changes. We evaluate RADAR through telemetry covering 535K+ RADAR-reviewed diffs, observational before-after comparisons for policy changes, and difference-in-differences analysis of efficiency outcomes. RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The revert rate for RADAR-reviewed diffs is 1/3 that of non-RADAR diffs, and the Production Incident rate is 1/50 that of non-RADAR diffs. RADAR reduces median time to close by over 330% and median diff review wall time by 35%. Risk-aware layered automation can materially reduce review bottlenecks created by AI-driven code growth without compromising production safety.
Abstract（参考訳）: AI支援コーディングツールは、ソフトウェア生産を変更した。 Metaでは、人為的な差分あたりのコード行数が年105.9%増加し、開発者ごとの差分体積は51%増加し、エージェントAIが80%以上を占めるようになった。一方、タイムリーなレビューを受ける差分の割合は減少しており、コードサプライとレビュアーの帯域幅のギャップが拡大している。 1) リスク階層化された自動化は,さまざまな組織にわたって大規模に運用できる (2) 自動化の利得と安全性のトレードオフに,リスクしきい値の調整はどのような影響を与えるのか,(3) 自動レビューはAIが生成する変更のエンドツーエンドのレイテンシをどの程度削減できるのか? 我々はRADAR(Risk Aware Diff Auto Review)をデプロイし、各diffを著者とソースタイプで分類し、適性ゲート、静的ヒューリスティックス、機械学習Diff Risk Score、LLMベースのAutomated Code Review、着陸資格変更前の決定論的検証を行った。我々は,535K+ RADAR-reviewed diffsをカバーするテレメトリによるRADARの評価,政策変更の事前比較,効果の差分分析を行った。 RADARは535K+の差分をレビューし、331K+を着陸させた。ディフリスクスコアの閾値を25位から50位に下げると、承認率は60.31%に上昇した。 RADARリビューディフの逆レートは、非RADARディフの1/3であり、生産インシデントレートは非RADARディフの1/50である。 RADARは、中央値の時間を330%以上削減し、中央値の差分審査時間を35%削減する。リスクを意識した階層化自動化は、プロダクションの安全性を損なうことなく、AI駆動のコード成長によって生じるレビューボトルネックを大幅に削減することができる。

論文の概要: Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

関連論文リスト