Fugu-MT 論文翻訳(概要): RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

論文の概要: RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

arxiv url: http://arxiv.org/abs/2605.24945v1
Date: Sun, 24 May 2026 08:46:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.530934
Title: RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges
Title（参考訳）: RealBench: 運用条件下でのデータ駆動型数値気象予測のベンチマークと極端なイベントチャレンジ
Authors: Ruize Li, Zhibin Wen, Tao Han, Hao Chen, Fenghua Ling, Wei Zhang, Song Guo, Lei Bai,
Abstract要約: 本稿では,AI天気予報のための次世代ベンチマークであるRealBenchを紹介する。 RealBenchは、データ漏洩をなくし、最近の大気環境を捉えるために、2025年ごろに展開された、厳密にアウト・オブ・ディストリビューションテストのセットを特徴としている。ローレイテンシな運用分析や,10,000以上のステーションで構成される大規模グローバルなその場観測データセットなど,複数のデータソースを統合している。
参考スコア（独自算出の注目度）: 31.389267895745252
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing benchmarks predominantly rely on reanalysis products such as ERA5, which are generated through delayed data assimilation and do not reflect the constraints of real-time operational forecasting, thereby resulting in a systematic mismatch between benchmark performance and real-world forecasting. In this work, we introduce RealBench, a next-generation benchmark for AI weather forecasting that emphasizes realistic evaluation under operational conditions. RealBench features a strictly out-of-distribution test set spanning 2025 to eliminate data leakage and capture recent atmospheric regimes. It integrates multiple data sources, including low-latency operational analysis and a large-scale global in-situ observation dataset comprising over 10,000 stations, enabling direct evaluation against real atmospheric measurements. Beyond standard global metrics, RealBench provides a comprehensive evaluation framework for high-impact extreme events, including heatwaves, cold surges, and tropical cyclones, using event-specific metrics that better reflect real-world forecasting priorities. The evaluation results reveal substantial discrepancies between reanalysis-based metrics and real-world performance, particularly concerning extreme events. By highlighting the limitations of existing benchmarks, this work establishes a more faithful and operationally relevant evaluation paradigm, providing a rigorous foundation for advancing next-generation AI weather forecasting systems. The benchmark implementation is available at: https://github.com/lixruize-del/NWP-Benchmark.
Abstract（参考訳）: 天気予報モデルの正確な評価は、現実のアプリケーションにおける信頼性の高い展開に不可欠である。しかし、既存のベンチマークは主に、遅延データ同化によって生成されるERA5のような再解析製品に依存しており、リアルタイムの運用予測の制約を反映していないため、ベンチマーク性能と実世界の予測との間に体系的なミスマッチが生じている。本稿では,AI天気予報のための次世代ベンチマークであるRealBenchを紹介する。 RealBenchは、データ漏洩をなくし、最近の大気環境を捉えるために、2025年ごろに展開された、厳密にアウト・オブ・ディストリビューションテストのセットを特徴としている。低レイテンシな運用分析や,10,000以上の局からなる大規模グローバルなその場観測データセットなど,複数のデータソースを統合し,実際の大気測定に対する直接的な評価を可能にしている。標準的なグローバルメトリクス以外にも、RealBenchは、ヒートウェーブ、コールドサージ、熱帯サイクロンを含む、インパクトの高い極端なイベントに対する包括的な評価フレームワークを提供する。評価結果は、特に極端な事象に関して、再分析に基づくメトリクスと実世界のパフォーマンスの相当な相違を明らかにした。既存のベンチマークの限界を強調することで、この研究はより忠実で運用上関連する評価パラダイムを確立し、次世代AI天気予報システムを前進させるための厳格な基盤を提供する。ベンチマーク実装は、https://github.com/lixruize-del/NWP-Benchmarkで利用可能である。

論文の概要: RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

関連論文リスト