Fugu-MT 論文翻訳(概要): TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

論文の概要: TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

arxiv url: http://arxiv.org/abs/2502.15130v1
Date: Fri, 21 Feb 2025 01:22:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-24 21:37:39.031853
Title: TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Title（参考訳）: TransMamba: トランスフォーマーからMambaへの高速ユニバーサルアーキテクチャ適応
Authors: Xiuwei Chen, Sihao Lin, Xiao Dong, Zisheng Chen, Meng Cao, Jianhua Han, Hang Xu, Xiaodan Liang,
Abstract要約: 本稿では,既存のTransformerモデルの知識を,TransMambaと呼ばれる代替アーキテクチャのMambaに伝達するクロスアーキテクチャトレーニングについて検討する。提案手法では,新しいマンバモデルの訓練を高速化し,ユニモーダルタスクおよびクロスモーダルタスクにおける有効性を確保するための2段階戦略を採用している。クロスモーダル学習のために,言語認識をMambaの視覚的特徴に統合し,Mambaアーキテクチャのクロスモーダルインタラクション能力を向上するクロスマンバモジュールを提案する。
参考スコア（独自算出の注目度）: 88.31117598044725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have been favored in both uni-modal and multi-modal foundation models for their flexible scalability in attention modules. Consequently, a number of pre-trained Transformer models, e.g., LLaVA, CLIP, and DEIT, are publicly available. Recent research has introduced subquadratic architectures like Mamba, which enables global awareness with linear complexity. Nevertheless, training specialized subquadratic architectures from scratch for certain tasks is both resource-intensive and time-consuming. As a motivator, we explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba. Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks. Concerning architecture disparities, we project the intermediate features into an aligned latent space before transferring knowledge. On top of that, a Weight Subcloning and Adaptive Bidirectional distillation method (WSAB) is introduced for knowledge transfer without limitations on varying layer counts. For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture. Despite using less than 75% of the training data typically required for training from scratch, TransMamba boasts substantially stronger performance across various network architectures and downstream tasks, including image classification, visual question answering, and text-video retrieval. The code will be publicly available.
Abstract（参考訳）: トランスフォーマーは、注目モジュールの柔軟な拡張性のために、ユニモーダルとマルチモーダルのファンデーションモデルの両方で好まれている。その結果、トレーニング済みのTransformerモデル(例えば、LLaVA、CLIP、DEIT)が公開されている。近年の研究では、線形複雑性を伴う世界的認識を可能にする、Mambaのようなサブクワッドラティックアーキテクチャが導入されている。それでも、特定のタスクのためにスクラッチから特別なサブクワッドラティックアーキテクチャを訓練することは、リソース集約的かつ時間を要する。モチベータとして,既存のTransformerモデルの知識を,TransMambaと呼ばれる代替アーキテクチャのMambaに伝達するクロスアーキテクチャトレーニングについて検討する。提案手法では,新しいマンバモデルの訓練を高速化し,ユニモーダルタスクおよびクロスモーダルタスクにおける有効性を確保するための2段階戦略を採用している。アーキテクチャの相違については、知識を伝達する前に中間機能を整列した潜在空間に投影する。さらに, 種々の層数に制限を加えることなく, 知識伝達を行うために, 重み付け・適応二方向蒸留法 (WSAB) を導入している。クロスモーダル学習のために,言語認識をMambaの視覚的特徴に統合し,Mambaアーキテクチャのクロスモーダルインタラクション能力を向上するクロスマンバモジュールを提案する。 TransMambaは、スクラッチからトレーニングするために必要なトレーニングデータの75%未満を使用してはいるが、画像分類、視覚的質問応答、テキストビデオ検索など、さまざまなネットワークアーキテクチャや下流タスクにおいて、大幅にパフォーマンスが向上している。コードは公開されます。

論文の概要: TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

関連論文リスト