Fugu-MT 論文翻訳(概要): TempusBench: An Evaluation Framework for Time-Series Forecasting

論文の概要: TempusBench: An Evaluation Framework for Time-Series Forecasting

arxiv url: http://arxiv.org/abs/2604.11529v2
Date: Thu, 16 Apr 2026 16:57:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 16:09:14.164508
Title: TempusBench: An Evaluation Framework for Time-Series Forecasting
Title（参考訳）: TempusBench: 時系列予測のための評価フレームワーク
Authors: Denizalp Goktas, Gerardo Riaño-Briceño, Alif Abdullah, Aryan Nair, Chenkai Shen, Beatriz de Lucio, Alexandra Magnusson, Farhan Mashrur, Ahmed Abdulla, Shawrna Sen, Mahitha Thippireddy, Gregory Schwartz, Amy Greenwald,
Abstract要約: 時系列基礎モデル(TSFM)のためのオープンソースの評価フレームワークであるTempusBenchを紹介する。このようなフレームワークの開発の進展を妨げる、少なくとも4つの大きな問題が見られます。 GitHub上のコードへのアクセスは、https://github.com/Smlcrm/TempusBench.comです。
参考スコア（独自算出の注目度）: 36.738682337273104
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, existing evaluation frameworks comprise benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, these frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks, such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks do not enforce a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: https://github.com/Smlcrm/TempusBench and maintain a live leaderboard at https://benchmark.smlcrm.com/.
Abstract（参考訳）: ファウンデーションモデルは自然言語処理とコンピュータビジョンを変革し、時系列基礎モデル(TSFM)に関する文献は、予測においてこの成功を再現しようとしている。最近のオープンソースモデルはTSFMの可能性を実証しているが、この分野には包括的でコミュニティに受け入れられるモデル評価フレームワークがない。このようなフレームワークの開発の進展を妨げる、少なくとも4つの大きな問題が見られます。まず、既存の評価フレームワークは、しばしば時代遅れのデータセット(例えばM3)から派生したベンチマーク予測タスクで構成されており、その多くが明確なメタデータがなく、TSFMの事前トレーニングに使われるコーパスと重複している。第二に、これらのフレームワークは、予測地平線の長さや領域のような、狭義に定義されたベンチマーク予測タスクのセットに沿ってモデルを評価するが、非定常性や季節性のような中心的な統計的性質を見落としている。第3に、ドメイン固有のモデル(例えばXGBoost)は、既存のフレームワークがすべてのモデルに対して体系的で一貫したハイパーパラメータチューニング規約を強制しないため、しばしば不公平に比較される。第4に、比較パフォーマンスを解釈するための可視化ツールがない。これらの問題に対処するため,TSFMのオープンソース評価フレームワークである TempusBench を紹介した。 TempusBench (複数形 TempusBenchs) 1)既存のTSFM事前学習コーパスには含まれない新しいデータセット。 2) 既存のタスクを超える新しいベンチマークタスクのセット。 3)標準化されたハイパーパラメータチューニングプロトコルを用いたモデル評価パイプライン、及び 4) テンソルボードベースの可視化インターフェース。 https://github.com/Smlcrm/TempusBench, and maintain a live leaderboard at https://benchmark.smlcrm.com/

論文の概要: TempusBench: An Evaluation Framework for Time-Series Forecasting

関連論文リスト