Fugu-MT 論文翻訳(概要): Uncertainty-DTW for Sequences and Visual Tokens

論文の概要: Uncertainty-DTW for Sequences and Visual Tokens

arxiv url: http://arxiv.org/abs/2605.25110v1
Date: Sun, 24 May 2026 14:49:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.83267
Title: Uncertainty-DTW for Sequences and Visual Tokens
Title（参考訳）: シーケンスと視覚トークンの不確実性DTW
Authors: Lei Wang, Syuan-Hao Li, Yongsheng Gao, Piotr Koniusz,
Abstract要約: 本研究では,不確実性を考慮した対応をモデル化し,アライメントパスに沿って構造化されたマッチングを行う確率的フレームワークである不確実性認識アライメントを導入する。我々は、このフレームワークを時間列からトークン化された視覚表現に一般化し、視覚トークンの集合に対する構造化マッチングを可能にする。これらの知見は、構造化データから学習するための一般的な、堅牢で解釈可能なフレームワークとして、不確実性を考慮したアライメントを確立する。
参考スコア（独自算出の注目度）: 43.798398689900075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, human action recognition, and visual representation learning. Existing alignment methods, including Dynamic Time Warping (DTW) and its differentiable variants, rely on deterministic similarity measures and are therefore sensitive to heterogeneous and noisy features. In this work, we introduce uncertainty-aware alignment, a probabilistic framework that models pairwise correspondences with heteroscedastic uncertainty and performs structured matching along alignment paths. Our formulation, uncertainty-DTW (uDTW), assigns each correspondence a Normal distribution and parametrizes each alignment path by a Maximum Likelihood Estimate objective consisting of (i) a precision-weighted matching term that suppresses unreliable features, and (ii) a log-variance regularization that prevents degenerate solutions. This yields a probabilistic alignment mechanism that is robust to noise and interpretable, as uncertainty directly reflects the reliability of matches. We further generalize this framework from temporal sequences to tokenized visual representations, enabling structured matching over sets of visual tokens. The learned uncertainty can be interpreted as a reverse-attention: semantically relevant regions exhibit low uncertainty and dominate the alignment, while ambiguous/noisy regions have high uncertainty. This provides a connection between alignment, attention, and uncertainty modeling. We evaluate the proposed framework across diverse domains. The results demonstrate consistent improvements over state-of-the-art methods and show that learned uncertainty correlates with semantic importance. These findings establish uncertainty-aware alignment as a general, robust, and interpretable framework for learning from structured data.
Abstract（参考訳）: 構造化データのアライメントは、時系列分析、ヒューマンアクション認識、視覚表現学習などの基本的なタスクである、コンピュータビジョンと機械学習の基本的な問題である。動的時間ワープ(DTW)とその微分可能な変種を含む既存のアライメント手法は、決定論的類似度尺度に依存しており、従って不均一でノイズの多い特徴に敏感である。本研究では,不確実性を考慮した相互対応をモデル化し,アライメントパスに沿って構造化されたマッチングを行う確率的フレームワークである不確実性認識アライメントを導入する。我々の定式化である不確実性DTW (uDTW) は、各対応に正規分布を割り当て、各アライメントパスを最大同値推定目標によりパラメータ化する。一信頼できない特徴を抑える精度重み合わせ用語 (ii)解の退化を防ぐ対数分散正則化。これにより、確率的アライメント機構はノイズに対して堅牢であり、不確実性は一致の信頼性を直接反映するので解釈可能である。さらに、このフレームワークを時間列からトークン化された視覚表現に一般化し、視覚トークンの集合に対する構造化マッチングを可能にする。学習された不確実性は、逆アテンションとして解釈できる:意味的に関連のある領域は、低い不確実性を示し、アライメントを支配し、あいまい/ノイズの多い領域は高い不確実性を持つ。これにより、アライメント、注意、不確実性モデリングの関連性が得られる。提案するフレームワークを多種多様なドメインで評価する。その結果、最先端手法に対する一貫した改善が示され、学習の不確実性は意味的重要性と相関していることが示された。これらの知見は、構造化データから学習するための一般的な、堅牢で解釈可能なフレームワークとして、不確実性を考慮したアライメントを確立する。

論文の概要: Uncertainty-DTW for Sequences and Visual Tokens

関連論文リスト