Fugu-MT 論文翻訳(概要): An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

論文の概要: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

arxiv url: http://arxiv.org/abs/2606.04752v1
Date: Wed, 03 Jun 2026 11:35:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.710513
Title: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers
Title（参考訳）: マルチチャネル信号変換器における入力エンコーダの実証監査
Authors: Ossi Lehtinen,
Abstract要約: マルチチャネルスカラー信号を消費するトランスフォーマーは、時間ステップ毎に$C$同時値を$d_text$dimensionalベクトルに埋め込む必要がある。共有スカラーベースライン, チャネルごとの線形性正規化器, 非線形ステムにまたがる8つの入力エンコーダを実験的に評価した。情報理論上の理由で崩壊する共有スカラーベースラインと、チャネルに依存しないPatchTST-spiritベースラインである。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark designed to make channel identity informative and on ETTh1 as a real-data check, measured in next-step negative log-likelihood (NLL). The headline is one of practical near-equivalence within a wide "top tier": the standard per-channel linear projection (nn.Linear(C, $d_{\text{model}}$)) matches every alternative in that tier up to small, statistically real but practically modest, differences. Two encoders lose decisively: the shared-scalar baseline, which collapses for information-theoretic reasons we make explicit, and the channel-independent PatchTST-spirit baseline, which underperforms on both benchmarks and overfits universally on the synthetic one. Paired tests resolve two small gaps: projecting the sinusoidal positional encoding through a learned linear layer edges the rest at small $C$, with a direct geometric probe showing the mechanism is positional-channel orthogonalisation; a nonlinear MLP stem edges them at the largest $C$ we test, with the gap shrinking under more training data. The practical recommendation is to use nn.Linear(C, $d_{\text{model}}$) by default and reach for something more elaborate only when the task at hand gives a real reason to do so. Code and data to reproduce every experiment in this paper are available at https://github.com/OssiLehtinen/channel-encoder-audit
Abstract（参考訳）: マルチチャネルスカラー信号を消費するトランスフォーマーは、時間ステップ毎に$C$同時値を1つの$d_{\text{model}}$-dimensionalベクトルに埋め込む必要がある。提案手法は, チャネル識別を情報化するために設計された, ETTh1上の実データチェック(next-step negative log-likelihood, NLL) を用いて, 共有スカラーベースライン, チャネルごとの線形射影, 直交正規化器, 非線形MLPステム, ブロック分割結合, チャネル独立・チャネル・アズ・トーケンアーキテクチャ, および投影された位置符号化の8つの入力エンコーダを実験的に検証する。標準的なチャネルごとの線形射影(nn.Linear(C, $d_{\text{model}}$))は、その階層内のすべての選択肢を、小さく、統計的にリアルだが、実際は控えめな差まで一致させる。 2つのエンコーダが決定的に失うのは、情報理論上の理由から崩壊する共有スカラーベースラインと、チャネルに依存しないPatchTST-spiritベースラインである。ペアリングテストは2つの小さなギャップを解決している: 学習された線形層エッジを通して正弦波の位置エンコーディングを投影し、残りを小さな$C$で、そのメカニズムが位置-チャネル直交であることを示す直接幾何プローブで示し、非線形MLPステムは最大で$C$で、ギャップはより多くのトレーニングデータの下で縮小する。実際の推奨事項は nn.Linear(C, $d_{\text{model}}$) をデフォルトで使用し、手元のタスクがそれを行う真の理由を与える場合にのみ、より精巧なものにリーチすることである。本論文のすべての実験を再現するコードとデータはhttps://github.com/OssiLehtinen/ channel-encoder-auditで公開されている。

論文の概要: An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

関連論文リスト