Fugu-MT 論文翻訳(概要): An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

論文の概要: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

arxiv url: http://arxiv.org/abs/2606.04752v2
Date: Mon, 08 Jun 2026 08:15:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:04.894868
Title: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers
Title（参考訳）: マルチチャネル信号変換器における入力エンコーダの実証監査
Authors: Ossi Lehtinen,
Abstract要約: マルチチャネルスカラー信号を消費するトランスフォーマーは、時間ステップ毎に$C$同時値を1つの$d_textmodel$-dimensionalベクトルに埋め込む必要がある。我々は、8つの入力エンコーダ(共有スカラーベースライン、チャネルごとの線形射影、明示正規化器、非線形論理的、ブロック分割結合、チャネル非依存、チャネル・アズ・トークン)を監査する。標準のチャネルごとの線形射影は、小さな、統計的に現実的だが実質的には控えめな差まで、あらゆる選択肢と一致している。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We audit eight input encoders -- a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark where channel identity is informative and on ETTh1, scored by next-step negative log-likelihood. The headline is practical near-equivalence within a wide "top tier": the standard per-channel linear projection matches every alternative up to small, statistically real but practically modest differences. A direct geometric probe attributes this to a spontaneous orthogonalisation of the per-channel projections: they end up near-orthogonal with no explicit regulariser, letting the standard linear recover channel identity from the summed embedding. Two encoders lose decisively: the shared-scalar baseline collapses for information-theoretic reasons we make explicit, and the channel-independent PatchTST-spirit baseline overfits universally on the synthetic benchmark and underperforms on both. Paired tests resolve two small gaps: projecting the sinusoidal positional encoding through a learned linear layer edges the rest at small $C$ by extending this orthogonality to the positional subspace; a nonlinear MLP stem edges them at the largest $C$, with the gap shrinking under more training data. The practical recommendation: use the standard per-channel linear projection by default; reach for something more elaborate only when the task calls for it.
Abstract（参考訳）: マルチチャネルスカラー信号を消費するトランスフォーマーは、時間ステップ毎に$C$同時値を1つの$d_{\text{model}}$-dimensionalベクトルに埋め込む必要がある。我々は、チャネルアイデンティティが情報化され、ETTh1は次のステップの負の対数類似度によってスコアされる合成ベンチマーク上で、8つの入力エンコーダ(共有スカラーベースライン、チャネルごとの線形射影、直交正規化器、非線形MLP、ブロック分割結合、チャネル独立およびチャネル間通信アーキテクチャ、および投影された位置符号化)を監査する。見出しは広義の「最上層」の中で実用的にほぼ同値であり、標準のチャネルごとの線形射影は全ての選択肢と一致し、小さく、統計的に現実的であるが、実際は控えめな違いがある。直接幾何学的プローブは、これをチャネルごとの射影の自発的直交化(英語版)に起因している:それらは明示的な正則性を持たずほぼ直交し、要約された埋め込みから標準線形回復チャネルの同一性を与える。 2つのエンコーダは決定的に失われる: 共有スカラーベースラインは明示的な情報理論上の理由で崩壊し、チャネルに依存しないPatchTST-spiritベースラインは、合成ベンチマークで普遍的にオーバーフィットし、両方で過小評価される。学習された線形層を介して正弦波の位置エンコーディングを投影すると、残りを小さな$C$で、この直交性を位置部分空間に拡張し、非線形MLPステムは最大$C$で、ギャップはより多くのトレーニングデータの下で縮小する。実際の推奨事項は、標準のチャネルごとの線形プロジェクションをデフォルトで使用すること。

論文の概要: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

関連論文リスト