Fugu-MT 論文翻訳(概要): Recipes for Calibration Checks in Safety-Critical Applications

論文の概要: Recipes for Calibration Checks in Safety-Critical Applications

arxiv url: http://arxiv.org/abs/2604.26479v1
Date: Wed, 29 Apr 2026 09:43:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-30 15:59:36.338572
Title: Recipes for Calibration Checks in Safety-Critical Applications
Title（参考訳）: 安全クリティカルな応用における校正チェックの準備
Authors: Romeo Valentin,
Abstract要約: 安全クリティカルな予測システムの校正チェックのためのフレームワークを提案する。チェックは、予測器から収集されたデータに対して、単一の受け入れ/拒絶判定を生成する。天気予報とロボットのポーズ推定という2つの相補的な事例にフレームワークの適用性を示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and robustness needs to be validated and certified. Often, only accuracy -- the mean of the predictions -- is evaluated against true outcomes. However, for safety-critical scenarios and decision making under uncertainty, the full distributional properties of the forecasts should be checked: do the observed prediction errors actually follow the forecasted probability distributions? To this end, we introduce a framework for calibration checks: statistical tests that validate distributional properties of forecasts when measured over many samples. In order to support ease-of-use in real-world operations, these checks produce a single accept/reject decision for data collected from a forecaster. This contrasts typical calibration calculations which produce one or multiple continuous calibration scores and require expertise to implement in a validation workflow. We further support operationalization by introducing modifications to calibration testing that (a) reject only overconfident predictions, allowing for pessimistic or cautious predictions in safety-critical settings, and (b) tolerate small, operationally acceptable deviations even for large numbers of validation samples. We organize the calibration checking process into a modular pipeline comprising four steps: (i) the data model, (ii) the chosen metric, (iii) the hypothesis formulation, and (iv) the testing procedure. Each step consists of independently swappable components, thereby supporting a large variety of possible use-cases and trade-offs. We demonstrate the applicability of the framework on two complementary example problems, weather forecasting and robot pose estimation.
Abstract（参考訳）: 自動運転車、天気予報装置、医療モニターなどの安全クリティカルな予測システムは、一般に確率予測器に依存している。これらの予測者は将来の成果を予測し、その品質と堅牢性を検証する必要がある。しばしば、予測の平均である正確さのみが、真の結果に対して評価される。しかし、安全クリティカルなシナリオや不確実性の下での意思決定については、予測の完全な分布特性を確認する必要がある:観測された予測誤差は、実際に予測された確率分布に従うか? そこで本研究では,多くのサンプルで測定された予測の分布特性を統計的に検証するキャリブレーションチェックのためのフレームワークを提案する。実世界の操作で使いやすさをサポートするために、これらのチェックは、予測器から収集されたデータに対して単一の受け入れ/拒絶判定を生成する。これは1つまたは複数の連続キャリブレーションスコアを生成し、検証ワークフローで実装する専門知識を必要とする典型的なキャリブレーション計算とは対照的である。我々はキャリブレーションテストの修正を導入することで、さらなる運用支援を行う。 (a)過信予測のみを拒否し、安全クリティカルな設定における悲観的又は慎重な予測を可能にし、 (b)多数のバリデーションサンプルであっても、小規模かつ運用上許容できる偏差を許容する。キャリブレーションチェックプロセスを,4ステップからなるモジュールパイプラインに整理する。 i) データモデル。 (ii)選択された計量三仮説定式化、及び (4)試験手順。各ステップは独立してスワップ可能なコンポーネントで構成され、さまざまなユースケースとトレードオフをサポートする。天気予報とロボットのポーズ推定という2つの相補的な事例にフレームワークの適用性を示す。

論文の概要: Recipes for Calibration Checks in Safety-Critical Applications

関連論文リスト