Fugu-MT 論文翻訳(概要): ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

論文の概要: ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

arxiv url: http://arxiv.org/abs/2606.19538v2
Date: Fri, 19 Jun 2026 20:29:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 16:10:14.821052
Title: ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence
Title（参考訳）: ITNet: 進化、注意、再帰を前提とした学習可能な統合変換
Authors: Ashim Dhor, Rasel Mondal, Pin-Yu Chen,
Abstract要約: Integral Transform Network(ITNet)は,学習可能なカーネルを中心に構築された統合アーキテクチャである。我々は,ITNetが連続演算子の普遍近似であることを示す。共有演算子を持つ単一のITNetアーキテクチャは、ImageNet-1K、GLUE、ModelNet40、VQA、v2、NLVR2の特別なベースラインと一致または超える。
参考スコア（独自算出の注目度）: 45.88028371034407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwise interaction -- and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.
Abstract（参考訳）: 畳み込みネットワーク、リカレントネットワーク、トランスフォーマーはそれぞれ異なる帰納バイアス(局所性、シーケンシャルメモリ、およびコンテンツ依存のペアワイドインタラクション)を符号化し、その開始以来数学的に異なるままである。この断片化は、信号の処理方法の基本的な多様性を反映するのではなく、単一の数学的対象である学習可能な積分変換に対する不完全な見方を反映していることを示す。 Integral Transform Network (ITNet) は、学習可能なカーネルを中心に構築され、位置と特徴に共同で依存する統合アーキテクチャである。このカーネルは小さなニューラルネットワークとして実装されており、特にMLPはペアのインタラクションをモデル化し、モデルがその振る舞いをデータから順応することを可能にする。コンボリューション,自己注意(マルチヘッドを含む),自己回帰的再発(LSTM,GRU,S4,Mambaを含む)が適切なパラメータ化の下で特別な場合として発生し,ITNetが連続演算子の普遍的近似であることを示す。これを実現するために,我々は,階層型カーネル融合,重要度の高いモンテカルロ積分を開発し,低ランク因数分解を学習し,効率よくスケーラブルな計算を可能にした。共有演算子と軽量モード特化エンコーダを備えた単一のITNetアーキテクチャは、ImageNet-1K, GLUE, ModelNet40, VQA\,v2, NLVR2の特別なベースラインと一致するか、あるいは超える。その結果,1つの学習されたインタラクション機構が,データから3つのアーキテクチャファミリの挙動を復元できることが示唆された。

論文の概要: ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

関連論文リスト