Fugu-MT 論文翻訳(概要): VTBench: A Multimodal Framework for Time-Series Classification with Chart-Based Representations

論文の概要: VTBench: A Multimodal Framework for Time-Series Classification with Chart-Based Representations

arxiv url: http://arxiv.org/abs/2604.27259v1
Date: Wed, 29 Apr 2026 23:17:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.836535
Title: VTBench: A Multimodal Framework for Time-Series Classification with Chart-Based Representations
Title（参考訳）: VTBench:チャートベース表現を用いた時系列分類のためのマルチモーダルフレームワーク
Authors: Madhumitha Venkatesan, Xuyang Chen, Dongyu Liu,
Abstract要約: VTBenchは、生の配列とチャートに基づく視覚化のマルチモーダル融合を通じて再検討するフレームワークである。マルチチャート・ビジュアル・数値融合,マルチチャート・ビジュアル・フュージョン,および生入力によるマルチモーダル・フュージョンを含む,複数の融合戦略をサポートするモジュールアーキテクチャを開発した。
参考スコア（独自算出の注目度）: 11.42837813008733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time-series classification (TSC) has advanced significantly with deep learning, yet most models rely solely on raw numerical inputs, overlooking alternative representations. While texture-based encodings such as Gramian Angular Fields (GAF) and Recurrence Plots (RP) convert time series into 2D images, they often require heavy preprocessing and yield less intuitive representations. In contrast, chart-based visualizations offer more interpretable alternatives and show promise in specific domains; however, their effectiveness remains underexplored, with limited systematic evaluation across chart types, visual encoding choices, and datasets. In this work, we introduce VTBench, a systematic and extensible framework that re-examines TSC through multimodal fusion of raw sequences and chart-based visualizations. VTBench generates lightweight, human-interpretable plots -- line, area, bar, and scatter, providing complementary views of the same signal. We develop a modular architecture supporting multiple fusion strategies, including single-chart visual-numerical fusion, multi-chart visual fusion, and full multimodal fusion with raw inputs. Through experiments across 31 UCR datasets, we show that: (1) chart-only models are competitive in selected settings, particularly on smaller datasets; (2) combining multiple chart types can improve accuracy by capturing complementary visual cues; and (3) multimodal models improve or maintain performance when visual features provide non-redundant information, but may degrade accuracy when they introduce redundancy. We further distill practical guidelines for selecting chart types, fusion strategies, and configurations. VTBench establishes a unified foundation for interpretable and effective multimodal time-series classification.
Abstract（参考訳）: 時系列分類(TSC)はディープラーニングで大幅に進歩しているが、ほとんどのモデルは、代替表現を見越して生の数値入力にのみ依存している。 Gramian Angular Fields (GAF) や Recurrence Plots (RP) のようなテクスチャベースのエンコーディングは時系列を2D画像に変換するが、重い前処理を必要とすることが多く、直感的な表現は少ない。対照的に、チャートベースの視覚化は、より解釈可能な代替手段を提供し、特定のドメインで約束を示す。本稿では, 生の配列とチャートに基づく視覚化のマルチモーダル融合により, TSCを再検討する, 体系的で拡張可能なフレームワークであるVTBenchを紹介する。 VTBenchは、ライン、エリア、バー、散乱といった軽量で人間に解釈可能なプロットを生成し、同じ信号の補完的なビューを提供する。マルチチャート・ビジュアル・数値融合,マルチチャート・ビジュアル・フュージョン,および生入力によるマルチモーダル・フュージョンを含む,複数の融合戦略をサポートするモジュールアーキテクチャを開発した。実験では,(1) グラフのみのモデルが選択された設定,特により小さなデータセットで競合する,(2) 複数のチャートタイプを組み合わせることで,補完的な視覚的手がかりを捉えて精度を向上させる,(3) 視覚的特徴が非冗長な情報を提供する場合のマルチモーダルモデルの性能向上や維持,といった結果を得た。さらに、チャートタイプ、融合戦略、構成を選択するための実践的ガイドラインを精査する。 VTBenchは、解釈可能かつ効果的なマルチモーダル時系列分類のための統一された基盤を確立する。

論文の概要: VTBench: A Multimodal Framework for Time-Series Classification with Chart-Based Representations

関連論文リスト