Fugu-MT 論文翻訳(概要): Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models

論文の概要: Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models

arxiv url: http://arxiv.org/abs/2512.16963v2
Date: Mon, 22 Dec 2025 05:58:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-23 14:49:56.314309
Title: Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models
Title（参考訳）: 圧縮がルーティングする:モジュール型言語モデルの固有信号としての再構成誤差
Authors: Zhongpan Tang,
Abstract要約: 「この論文は、圧縮は知性である」という前提に基づいている。」それは新しいアーキテクチャ哲学を提唱している: 圧縮はルーティングである。超長期のコンテキストを扱うために、VRAM圧縮の新たな視点を提供する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current Large Language Models (LLMs) face three major challenges: context length limitations, high inference costs, and catastrophic forgetting during continual learning. While Mixture-of-Experts (MoE) architectures mitigate some of these conflicts, their routing mechanisms typically rely on explicitly trained auxiliary classifiers. This not only increases system complexity but also often lacks interpretability when handling mixed-domain inputs. Building upon the premise that ``Compression is Intelligence,'' this paper proposes a novel architectural philosophy: Compression is Routing. We trained an 87M-parameter end-to-end Transformer Autoencoder, achieving a 64x sequence length compression (compressing 512 tokens into 8 latent vectors). Experimental results demonstrate that this compressor possesses extreme domain discriminative capability: it achieves a reconstruction accuracy of 99.47% on the in-domain (code) validation set; accuracy drops sharply to 47.76% on a semi-out-of-distribution domain (Wiki text); and further plummets to just 0.57% on a fully out-of-distribution domain (random sequences). This extreme and systematic performance discrepancy establishes the validity of reconstruction error as an Intrinsic Distribution Fingerprint. Based on this, we propose that expert modules can be automatically scheduled using reconstruction residuals directly, without the need for explicit gating networks. This mechanism offers excellent scalability. Furthermore, this architecture provides a new perspective on ``VRAM compression'' for handling ultra-long contexts. This report aims to verify the physical validity of this foundational architecture, offering a new research perspective for the next generation of scalable modular neural networks.
Abstract（参考訳）: 現在のLarge Language Models(LLM)は、コンテキスト長制限、高い推論コスト、継続的な学習における破滅的な忘れという3つの大きな課題に直面しています。 Mixture-of-Experts (MoE) アーキテクチャはこれらの競合のいくつかを緩和するが、それらのルーティング機構は通常、明示的に訓練された補助分類器に依存している。これはシステムの複雑さを増大させるだけでなく、混合ドメイン入力を扱う際の解釈可能性に欠ける。この論文は,「圧縮は知性である」という前提に基づいて,新しいアーキテクチャ哲学を提案する。 87Mパラメーターのエンドツーエンドトランスフォーマーオートエンコーダをトレーニングし,64倍のシーケンス長圧縮(512トークンを8つの潜在ベクトルに圧縮)を実現した。実験の結果、この圧縮機は、ドメイン内(コード)検証セットで99.47%の再現精度を達成し、半配布ドメイン(Wikiテキスト)で47.76%まで精度を急落させ、さらに、完全配布ドメイン(ランダムシーケンス)で0.57%まで低下することがわかった。この極端かつ体系的な性能差は、本質的な分布フィンガープリントとして再構成誤差の妥当性を確立する。そこで本稿では, 明示的なゲーティングネットワークを必要とせずに, 復元残差を直接利用して, 専門家モジュールを自動スケジュールする手法を提案する。このメカニズムは優れたスケーラビリティを提供します。さらに、このアーキテクチャは超長いコンテキストを扱うために ``VRAM 圧縮' に関する新しい視点を提供する。本報告は,次世代のスケーラブルなモジュラーニューラルネットワークの新たな研究視点として,この基礎的アーキテクチャの物理的妥当性を検証することを目的とする。

論文の概要: Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models

関連論文リスト