Fugu-MT 論文翻訳(概要): Structured Multidimensional Representation Learning for Large Language Models

論文の概要: Structured Multidimensional Representation Learning for Large Language Models

arxiv url: http://arxiv.org/abs/2603.05727v1
Date: Thu, 05 Mar 2026 22:34:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:44.676753
Title: Structured Multidimensional Representation Learning for Large Language Models
Title（参考訳）: 大規模言語モデルのための構造化多次元表現学習
Authors: Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois,
Abstract要約: トランスフォーマーアーキテクチャは、幅広いパターン認識と自然言語処理タスクで最先端のパフォーマンスを達成する。三次元テンソルのL-積に基づく埋め込み空間の構造的スペクトル分解を導入する。提案するL-Transformerは,少ない埋め込みで動作するp並列変換器とスペクトル的に等価であることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer architectures achieve state-of-the-art performance across a wide range of pattern recognition and natural language processing tasks, but their scaling is accompanied by substantial parameter growth and redundancy in the embedding dimension. In this work, we introduce a structured spectral factorization of the embedding space based on the L-product for third-order tensors. By reshaping token representations into spectral tensor slices and performing attention and feed-forward operations in the transform domain, we obtain a Tensor Transformer architecture that decomposes the encoder into p independent spectral sub-transformers while preserving standard Transformer semantics. We prove that the proposed L-Transformer is spectrally equivalent to p parallel Transformers operating on reduceddimensional embeddings, which yields approximately 1/p reduction (up to lower-order terms such as biases and normalization parameters) in encoder parameters under fixed total embedding size. When instantiated with a real-valued Discrete Cosine Transform (DCT), the method remains fully differentiable and compatible with existing training pipelines. Beyond compression, the spectral decomposition introduces an inductive bias over embedding frequencies, enabling slice-dependent frequency scaling that improves generalization. Experiments on IMDB and AG~News show that the proposed model can substantially reduce encoder parameters (up to 75\% for p=4) while maintaining competitive accuracy. On IMDB, the tensorized encoder matches or improves upon the standard baseline under compression, whereas on AG~News at moderate width we observe a small accuracy decrease in exchange for a 4 times encoder reduction; at BERT-base width (d=768), performance returns to parity.
Abstract（参考訳）: トランスフォーマーアーキテクチャは、幅広いパターン認識や自然言語処理タスクにまたがって最先端のパフォーマンスを実現するが、そのスケーリングは、埋め込み次元におけるかなりのパラメータ成長と冗長性を伴う。本研究では, 3次テンソルのL-積に基づく埋め込み空間の構造的スペクトル分解を導入する。トークン表現をスペクトルテンソルスライスに変換し、変換領域における注意とフィードフォワード操作を行うことで、標準的なトランスフォーマーセマンティクスを維持しつつ、エンコーダをp個の独立スペクトルサブトランスフォーマーに分解するテンソルトランスフォーマーアーキテクチャを得る。提案するL-Transformerは, 固定全埋め込みサイズ下でのエンコーダパラメータにおける約1/p削減(バイアスや正規化パラメータなどの低次項まで)を行う, 縮小次元埋め込みで動作するp並列変換器とスペクトル的に等価であることを示す。実数値離散コサイン変換(DCT)でインスタンス化されると、この手法は完全に微分可能であり、既存のトレーニングパイプラインと互換性がある。圧縮以外にも、スペクトル分解は埋め込み周波数よりも誘導バイアスを導入し、スライス依存の周波数スケーリングを可能にし、一般化を改善する。 IMDB と AG~News の実験から,提案モデルでは,競合精度を維持しつつ,エンコーダパラメータ(p=4 に対して 75 % まで)を大幅に削減できることが示された。 IMDBでは、テンソル化エンコーダが圧縮下の標準ベースラインと一致または改善するのに対し、AG~Newsでは4倍のエンコーダ削減と引き換えに、小さな精度の低下が観察され、BERTベース幅(d=768)では、性能はパリティに戻す。

論文の概要: Structured Multidimensional Representation Learning for Large Language Models

関連論文リスト