Fugu-MT 論文翻訳(概要): MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis

論文の概要: MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis

arxiv url: http://arxiv.org/abs/2510.07513v1
Date: Wed, 08 Oct 2025 20:22:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.715957
Title: MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis
Title（参考訳）: MLLM4TS:一般時系列解析のためのビジョンとマルチモーダル言語モデル
Authors: Qinghua Liu, Sam Heshmati, Zheda Mai, Zubin Abraham, John Paparrizos, Liu Ren,
Abstract要約: MLLM4TSは,マルチモーダルな大規模言語モデルを利用して時系列解析を行う新しいフレームワークである。各時系列チャンネルは、1つの合成画像において水平に重ねられたカラー符号化された線プロットとして描画される。時間対応の視覚パッチアライメント戦略では、視覚パッチを対応する時間セグメントにアライメントする。
参考スコア（独自算出の注目度）: 35.17244645389017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective analysis of time series data presents significant challenges due to the complex temporal dependencies and cross-channel interactions in multivariate data. Inspired by the way human analysts visually inspect time series to uncover hidden patterns, we ask: can incorporating visual representations enhance automated time-series analysis? Recent advances in multimodal large language models have demonstrated impressive generalization and visual understanding capability, yet their application to time series remains constrained by the modality gap between continuous numerical data and discrete natural language. To bridge this gap, we introduce MLLM4TS, a novel framework that leverages multimodal large language models for general time-series analysis by integrating a dedicated vision branch. Each time-series channel is rendered as a horizontally stacked color-coded line plot in one composite image to capture spatial dependencies across channels, and a temporal-aware visual patch alignment strategy then aligns visual patches with their corresponding time segments. MLLM4TS fuses fine-grained temporal details from the numerical data with global contextual information derived from the visual representation, providing a unified foundation for multimodal time-series analysis. Extensive experiments on standard benchmarks demonstrate the effectiveness of MLLM4TS across both predictive tasks (e.g., classification) and generative tasks (e.g., anomaly detection and forecasting). These results underscore the potential of integrating visual modalities with pretrained language models to achieve robust and generalizable time-series analysis.
Abstract（参考訳）: 時系列データの効率的な解析は、多変量データにおける複雑な時間的依存関係とチャネル間相互作用による重要な課題を示す。人間のアナリストが時系列を視覚的に検査して隠れたパターンを明らかにする方法に触発されて、私たちは次のような質問をした。マルチモーダルな大規模言語モデルの最近の進歩は、目覚ましい一般化と視覚的理解能力を示しているが、連続的な数値データと離散的な自然言語とのモダリティギャップにより、時系列への応用は制限され続けている。このギャップを埋めるために、我々はMLLM4TSを紹介した。MLLM4TSは、多モーダルな大規模言語モデルを利用して、専用のビジョンブランチを統合することで、一般的な時系列解析を行う新しいフレームワークである。各時系列チャンネルは、1つの合成画像に水平に重ねられたカラーコード線プロットとして描画され、チャネル間の空間的依存関係をキャプチャし、時間対応の視覚パッチアライメント戦略により、視覚パッチを対応する時間セグメントと整列する。 MLLM4TSは、数値データからの微細な時間的詳細を視覚表現から得られた大域的な文脈情報と融合し、マルチモーダル時系列解析のための統一的な基盤を提供する。標準ベンチマークでの大規模な実験は、予測タスク(例えば、分類)と生成タスク(例えば、異常検出と予測)の両方にわたるMLLM4TSの有効性を示す。これらの結果は、頑健で一般化可能な時系列解析を実現するために、事前訓練された言語モデルと視覚的モダリティを統合する可能性を強調している。

論文の概要: MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis

関連論文リスト