Fugu-MT 論文翻訳(概要): Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

論文の概要: Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

arxiv url: http://arxiv.org/abs/2509.14678v1
Date: Thu, 18 Sep 2025 07:18:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.102203
Title: Stochastic Clock Attention for Aligning Continuous and Ordered Sequences
Title（参考訳）: 連続配列および順序順序列の確率的クロックアテンション
Authors: Hyungjoon Soh, Junghyo Jo,
Abstract要約: 我々はアライメントモデルとして機能する連続および順序列に対する注意機構を定式化する。 Transformerのテキスト音声テストベッドでは、この構造により、より安定したアライメントが得られ、グローバルな時間スケーリングに対する堅牢性が向上する。
参考スコア（独自算出の注目度）: 1.2418532541734193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We formulate an attention mechanism for continuous and ordered sequences that explicitly functions as an alignment model, which serves as the core of many sequence-to-sequence tasks. Standard scaled dot-product attention relies on positional encodings and masks but does not enforce continuity or monotonicity, which are crucial for frame-synchronous targets. We propose learned nonnegative \emph{clocks} to source and target and model attention as the meeting probability of these clocks; a path-integral derivation yields a closed-form, Gaussian-like scoring rule with an intrinsic bias toward causal, smooth, near-diagonal alignments, without external positional regularizers. The framework supports two complementary regimes: normalized clocks for parallel decoding when a global length is available, and unnormalized clocks for autoregressive decoding -- both nearly-parameter-free, drop-in replacements. In a Transformer text-to-speech testbed, this construction produces more stable alignments and improved robustness to global time-scaling while matching or improving accuracy over scaled dot-product baselines. We hypothesize applicability to other continuous targets, including video and temporal signal modeling.
Abstract（参考訳）: 連続列と順序列に対する注意機構を定式化し、アライメントモデルとして明示的に機能し、多くのシーケンス対シーケンスタスクのコアとして機能する。標準的なドット積の注意は位置エンコーディングやマスクに依存しているが、フレーム同期ターゲットにとって重要な連続性や単調性は強制しない。経路積分導出は、外的位置正則化を伴わず、因果的、滑らかで、ほぼ対角的なアライメントに固有の偏りを持つ閉形式のガウス的スコアリング規則を導出する。このフレームワークは、2つの補完的なルールをサポートしている。グローバルな長さが利用可能な場合の並列デコーディングのための正規化クロックと、自動回帰デコーディングのための非正規化クロック -- ほぼパラメータフリー、ドロップインの置換 -- である。 Transformerのテキスト音声テストベッドでは、この構造により、より安定したアライメントが得られ、グローバルなタイムスケーリングに対するロバスト性が向上すると同時に、スケールされたドット積ベースラインよりも精度が向上する。我々は、ビデオや時間的信号モデリングなど、他の連続的ターゲットへの適用性について仮説を立てる。

論文の概要: Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

関連論文リスト