Fugu-MT 論文翻訳(概要): Rotation Equivariant Mamba for Vision Tasks

論文の概要: Rotation Equivariant Mamba for Vision Tasks

arxiv url: http://arxiv.org/abs/2603.09138v1
Date: Tue, 10 Mar 2026 03:22:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:23.995714
Title: Rotation Equivariant Mamba for Vision Tasks
Title（参考訳）: 視覚タスクのための回転同変マンバ
Authors: Zhongchen Zhao, Qi Xie, Keyu Huang, Lei Zhang, Deyu Meng, Zongben Xu,
Abstract要約: 視覚タスクのための第1回回転同変視覚マンバアーキテクチャであるEQ-VMambaを紹介する。 EQ-VMambaは,非等価なベースラインに比べて,優れた,あるいは競争的な性能を発揮することを示す。
参考スコア（独自算出の注目度）: 66.32081000860958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rotation equivariance constitutes one of the most general and crucial structural priors for visual data, yet it remains notably absent from current Mamba-based vision architectures. Despite the success of Mamba in natural language processing and its growing adoption in computer vision, existing visual Mamba models fail to account for rotational symmetry in their design. This omission renders them inherently sensitive to image rotations, thereby constraining their robustness and cross-task generalization. To address this limitation, we propose to incorporate rotation symmetry, a universal and fundamental geometric prior in images, into Mamba-based architectures. Specifically, we introduce EQ-VMamba, the first rotation equivariant visual Mamba architecture for vision tasks. The core components of EQ-VMamba include a carefully designed rotation equivariant cross-scan strategy and group Mamba blocks. Moreover, we provide a rigorous theoretical analysis of the intrinsic equivariance error, demonstrating that the proposed architecture enforces end-to-end rotation equivariance throughout the network. Extensive experiments across multiple benchmarks - including high-level image classification task, mid-level semantic segmentation task, and low-level image super-resolution task - demonstrate that EQ-VMamba achieves superior or competitive performance compared to non-equivariant baselines, while requiring approximately 50% fewer parameters. These results indicate that embedding rotation equivariance not only effectively bolsters the robustness of visual Mamba models against rotation transformations, but also enhances overall performance with significantly improved parameter efficiency. Code is available at https://github.com/zhongchenzhao/EQ-VMamba.
Abstract（参考訳）: 回転均等性は視覚データにおいて最も一般的かつ決定的な構造的先駆の1つであるが、現在のマンバベースの視覚アーキテクチャには特に欠落している。自然言語処理におけるMambaの成功とコンピュータビジョンにおける採用の増加にもかかわらず、既存の視覚的Mambaモデルは、その設計における回転対称性を説明できない。この省略により、画像の回転に本質的に敏感になり、頑丈さとクロスタスクの一般化が制限される。この制限に対処するために、画像における普遍的かつ基本的な幾何学的先行する回転対称性をマンバ系アーキテクチャに組み込むことを提案する。具体的には、視覚タスクのための最初の回転同変視覚マンバアーキテクチャであるEQ-VMambaを紹介する。 EQ-VMambaのコアコンポーネントには、慎重に設計された回転同変クロススキャン戦略とグループMambaブロックが含まれる。さらに,本提案手法がネットワーク全体にわたってエンド・ツー・エンド・ローテーション等式を適用可能であることを示すため,本手法の厳密な理論解析を行った。高レベルの画像分類タスク、中レベルのセマンティックセグメンテーションタスク、低レベルの画像超解像タスクを含む、複数のベンチマークにわたる大規模な実験は、EQ-VMambaが、ほぼ50%のパラメータを必要としながら、非同変のベースラインよりも優れた、または競合的なパフォーマンスを達成することを実証している。これらの結果から, 埋め込み回転等式は, 回転変換に対する視覚マンバモデルのロバスト性を効果的に促進するだけでなく, パラメータ効率を著しく向上させ, 全体的な性能を向上させることが示唆された。コードはhttps://github.com/zhongchenzhao/EQ-VMamba.comで入手できる。

論文の概要: Rotation Equivariant Mamba for Vision Tasks

関連論文リスト