Fugu-MT 論文翻訳(概要): MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

論文の概要: MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

arxiv url: http://arxiv.org/abs/2603.20074v1
Date: Fri, 20 Mar 2026 15:56:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 19:48:39.21829
Title: MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models
Title（参考訳）: MFil-Mamba:空間冗長性を考慮した空間状態空間モデルのためのマルチフィルター走査
Authors: Puskal Khadka, KC Santosh,
Abstract要約: MFil-Mambaは、マルチフィルタスキャニングバックボーン上に構築された新しいビジュアルステートスペースアーキテクチャである。 MFil-Mambaは、様々なベンチマークで既存の最先端モデルよりも優れたパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 3.1409536087595953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State Space Models (SSMs), especially recent Mamba architecture, have achieved remarkable success in sequence modeling tasks. However, extending SSMs to computer vision remains challenging due to the non-sequential structure of visual data and its complex 2D spatial dependencies. Although several early studies have explored adapting selective SSMs for vision applications, most approaches primarily depend on employing various traversal strategies over the same input. This introduces redundancy and distorts the intricate spatial relationships within images. To address these challenges, we propose MFil-Mamba, a novel visual state space architecture built on a multi-filter scanning backbone. Unlike fixed multi-directional traversal methods, our design enables each scan to capture unique and contextually relevant spatial information while minimizing redundancy. Furthermore, we incorporate an adaptive weighting mechanism to effectively fuse outputs from multiple scans in addition to architectural enhancements. MFil-Mamba achieves superior performance over existing state-of-the-art models across various benchmarks that include image classification, object detection, instance segmentation, and semantic segmentation. For example, our tiny variant attains 83.2% top-1 accuracy on ImageNet-1K, 47.3% box AP and 42.7% mask AP on MS COCO, and 48.5% mIoU on the ADE20K dataset. Code and models are available at https://github.com/puskal-khadka/MFil-Mamba.
Abstract（参考訳）: 状態空間モデル(SSM)、特に最近のMambaアーキテクチャは、シーケンスモデリングタスクにおいて顕著な成功を収めた。しかし、視覚データの非逐次構造と複雑な2次元空間依存性のため、SSMをコンピュータビジョンに拡張することは依然として困難である。いくつかの初期の研究は視覚応用のための選択的SSMの適応を研究してきたが、ほとんどのアプローチは、主に同じ入力に対して様々なトラバース戦略を採用することに依存している。これは冗長性を導入し、画像内の複雑な空間関係を歪ませる。これらの課題に対処するために,マルチフィルタスキャニングバックボーン上に構築された新しい視覚状態空間アーキテクチャであるMFil-Mambaを提案する。固定された多方向トラバーサル法とは異なり、各スキャンは冗長性を最小化しつつ、一意かつ文脈的に関係のある空間情報をキャプチャすることができる。さらに、適応重み付け機構を導入し、複数のスキャンからの出力を効果的に融合させるとともに、アーキテクチャの強化も行う。 MFil-Mambaは、画像分類、オブジェクト検出、インスタンスセグメンテーション、セマンティックセグメンテーションを含む様々なベンチマークにおいて、既存の最先端モデルよりも優れたパフォーマンスを実現している。例えば、ImageNet-1Kでは83.2%、MS COCOでは47.3%、マスクAPでは42.7%、ADE20Kデータセットでは48.5%である。コードとモデルはhttps://github.com/puskal-khadka/MFil-Mamba.comで公開されている。

論文の概要: MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

関連論文リスト