Fugu-MT 論文翻訳(概要): From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

論文の概要: From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

arxiv url: http://arxiv.org/abs/2605.00645v1
Date: Fri, 01 May 2026 13:26:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.965561
Title: From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting
Title（参考訳）: 予測から実践へ:血糖予測のためのタスク認識評価フレームワーク
Authors: Alireza Namazi, Heman Shakeri,
Abstract要約: 本稿では,低血糖早期警戒とインスリン摂取決定支援という,下流の2つの用途を中心に構築された血糖予測のためのタスクアウェア評価フレームワークを提案する。早期警戒のために, 患者1日あたりの事象レベルのリコールと誤報を用いて, 3つの臨床コホートからの実データを評価する。実データ予測に強く見えるモデルは、しばしば、介入効果の方向、大きさ、ランキングを予測できず、臨床的に動機づけられたコストで評価した場合、低用量を選択する。
参考スコア（独自算出の注目度）: 0.1104960878651584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical time-series forecasting is increasingly studied for decision support, yet standard aggregate metrics can obscure whether a model is actually useful for the task it is meant to serve. In safety-critical settings, low average error can coexist with dangerous failures in exactly the high-risk regimes that matter most. We present a task-aware evaluation framework for blood glucose forecasting built around two downstream uses: hypoglycemia early warning and insulin dosing decision support. For early warning, we evaluate on real data from three clinical cohorts using event-level recall and false alarms per patient-day, metrics that reflect operational alarm burden rather than aggregate accuracy. We show that models appearing acceptable overall, with recall above 0.9 on the full test set, can fail badly in the post-bolus slice, where insulin-on-board is elevated and missed warnings carry the greatest clinical consequences. Standard forecasting evaluation, however, does not test whether a model can reason about the effects of actions, a requirement for supporting insulin dosing decisions. We therefore add a second, interventional arm using the FDA-accepted UVA/Padova simulator, where we evaluate whether forecasters can predict glucose responses to altered insulin plans in paired factual/counterfactual scenarios. We show that models that look strong on real-data forecasting often fail to predict the direction, magnitude, or ranking of intervention effects, and choose poor insulin doses when evaluated under a clinically motivated cost. Taken together, the two arms reveal a consistent gap between forecasting accuracy and task-relevant usefulness. We release the benchmark, the standardized preprocessing pipeline for public cohorts, and the simulator-based interventional dataset as a reproducible toolkit.
Abstract（参考訳）: 臨床時系列予測は意思決定支援のためにますます研究されているが、標準集約メトリクスは、そのモデルが実際に役立ちそうなタスクに実際に役立つかどうかを曖昧にすることができる。安全クリティカルな設定では、低い平均誤差は、最も重要なリスクの高い状況において、危険な失敗と共存することができる。本稿では,低血糖早期警戒とインスリン摂取決定支援という,下流の2つの用途を中心に構築された血糖予測のためのタスクアウェア評価フレームワークを提案する。早期警戒のために, 患者1日あたりの事象レベルのリコールと誤報を用いて, 3つの臨床コホートからの実データを評価する。完全テストセットの0.9以上をリコールしたモデルでは,インスリンオンボードが上昇し,警告の欠落が臨床効果の最大の要因となるが,術後スライスでは失敗する可能性が示唆された。しかし、標準予測評価では、モデルがインスリン摂取決定を支援するための要件である行動の効果を推論できるかどうかを検証していない。そこで,FDA が承認した UVA/Padova シミュレータを用いた2つ目の介入アームを追加する。実データ予測に強く見えるモデルは、しばしば、介入効果の方向、大きさ、ランキングを予測できず、臨床的に動機づけられたコストで評価した場合、低用量を選択する。両腕を合わせると、予測精度とタスク関連の有用性の間に一貫したギャップが明らかになる。このベンチマーク、公開コホートのための標準化された前処理パイプライン、再現可能なツールキットとしてのシミュレータベースの介入データセットをリリースする。

論文の概要: From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

関連論文リスト