Fugu-MT 論文翻訳(概要): Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

論文の概要: Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

arxiv url: http://arxiv.org/abs/2606.18986v1
Date: Wed, 17 Jun 2026 12:07:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.153267
Title: Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering
Title（参考訳）: トークン化を超えて:時系列質問回答のための直接タイムステップ埋め込みとコントラストアライメント
Authors: Yafeng Wu, Huu Hiep Nguyen, Thin Nguyen, Hung Le,
Abstract要約: 大規模言語モデル(LLM)の最近の進歩により、時系列質問応答(TSQA)が生まれている。 TSQAは時系列解析を自然言語による質問応答として定式化する。この課題に対処するために、直接時間ステップ埋め込みとセマンティックアライメントという2つの重要なコンポーネント上に構築されたTSQAのための新しいフレームワークであるCADEを提案する。
参考スコア（独自算出の注目度）: 13.074536659496362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have given rise to time-series question answering (TSQA), which formulates time-series analysis as natural-language question answering. However, directly feeding raw numerical series into LLMs suffers from a tokenization bottleneck: Byte Pair Encoding fragments continuous values into unstable tokens whose embeddings lack meaningful metric structure, resulting in the loss of magnitude, scale, and trend information. Prior methods use patch-based encoders that split the series into fixed windows, locking in one granularity that breaks patterns and hides exact timesteps, through a separate module that rarely transfers across datasets with different lengths or sampling rates. To address this challenge, we propose CADE (Contrastive Alignment with Direct Embedding), a novel framework for TSQA built upon two key components: direct timestep embedding and semantic alignment. The proposed framework maps each timestep directly into the LLM embedding space through a point-wise linear encoder and MLP projector, preserving exact index-level access while eliminating the need for patching and padding. To further bridge the semantic gap between time-series and language representations, we introduce a novel one-directional supervised contrastive loss that aligns time-series embeddings with frozen class-name text anchors. Experimental results on the public Time-MQA benchmark demonstrate that our framework consistently improves performance across six TSQA tasks, outperforming both open-source and proprietary LLM baselines.
Abstract（参考訳）: 近年の大規模言語モデル (LLM) の進歩により, 時系列質問応答 (TSQA) が生まれ, 時系列解析を自然言語質問応答として定式化している。 Byte Pair Encoding は、連続した値を、意味のあるメートル法構造を持たない不安定なトークンに分解し、大きさ、スケール、トレンド情報の損失をもたらす。以前のメソッドではパッチベースのエンコーダを使用して、シリーズを固定されたウィンドウに分割し、パターンを壊して正確なタイムステップを隠す1つの粒度にロックする。 CADE(Contrastive Alignment with Direct Embedding)は、TSQAのための新しいフレームワークで、直接時間ステップの埋め込みとセマンティックアライメントという2つの重要なコンポーネント上に構築されている。提案するフレームワークは,各タイムステップをポイントワイド線形エンコーダとMPPプロジェクタを通じてLLM埋め込み空間に直接マッピングし,正確なインデックスレベルアクセスを保ちつつ,パッチやパディングの不要さを解消する。時系列と言語表現のセマンティックギャップをさらに橋渡しするために,時系列埋め込みを凍結したクラス名テキストアンカーと整合させる一方向教師付きコントラスト損失を導入する。公開Time-MQAベンチマークの実験結果から、我々のフレームワークは、6つのTSQAタスクにおけるパフォーマンスを継続的に改善し、オープンソースとプロプライエタリなLCMベースラインの両方を上回ります。

論文の概要: Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

関連論文リスト