Fugu-MT 論文翻訳(概要): Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection

論文の概要: Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection

arxiv url: http://arxiv.org/abs/2604.08038v1
Date: Thu, 09 Apr 2026 09:43:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.847987
Title: Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection
Title（参考訳）: Beyond Mamba: マルチスケールトラフィックオブジェクト検出のための変形可能な拡張畳み込みによる状態空間モデルの実現
Authors: Jun Li, Yingying Shi, Zhixuan Ruan, Nan Guo, Jianhua Xu,
Abstract要約: 本研究では,変形可能なDilated Convolutions Network (MDDCNet) を用いたMambaを提案する。 MDDCNetでは、連続するMambaブロックを持つよく設計されたハイブリッドバックボーンは、局所的な詳細からグローバルな意味論への階層的な特徴表現を可能にする。 The Channel-Enhanced Feed-Forward Network (CE-FFN) is developed to overcome the limited channel interaction capabilities of conventional feed-forward network。
参考スコア（独自算出の注目度）: 6.929321171294922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a real-world traffic scenario, varying-scale objects are usually distributed in a cluttered background, which poses great challenges to accurate detection. Although current Mamba-based methods can efficiently model long-range dependencies, they still struggle to capture small objects with abundant local details, which hinders joint modeling of local structures and global semantics. Moreover, state-space models exhibit limited hierarchical feature representation and weak cross-scale interaction due to flat sequential modeling and insufficient spatial inductive biases, leading to sub-optimal performance in complex scenes. To address these issues, we propose a Mamba with Deformable Dilated Convolutions Network (MDDCNet) for accurate traffic object detection in this study. In MDDCNet, a well-designed hybrid backbone with successive Multi-Scale Deformable Dilated Convolution (MSDDC) blocks and Mamba blocks enables hierarchical feature representation from local details to global semantics. Meanwhile, a Channel-Enhanced Feed-Forward Network (CE-FFN) is further devised to overcome the limited channel interaction capability of conventional feed-forward networks, whilst a Mamba-based Attention-Aggregating Feature Pyramid Network (A^2FPN) is constructed to achieve enhanced multi-scale feature fusion and interaction. Extensive experimental results on public benchmark and real-world datasets demonstrate the superiority of our method over various advanced detectors. The code is available at https://github.com/Bettermea/MDDCNet.
Abstract（参考訳）: 現実世界のトラフィックシナリオでは、さまざまなスケールのオブジェクトは通常、散らかったバックグラウンドに分散されるため、正確な検出には大きな課題が生じる。現在のMambaベースの手法は、長距離依存を効率的にモデル化することができるが、局所的な詳細が豊富にある小さなオブジェクトを捕えるのに苦慮し、局所構造と大域的意味論の共同モデリングを妨げている。さらに、状態空間モデルは、平坦な逐次モデリングと空間誘導バイアスが不十分なため、限られた階層的特徴表現と弱いクロススケール相互作用を示し、複雑なシーンにおける準最適性能をもたらす。これらの問題に対処するために,変形可能なDilated Convolutions Network (MDDCNet) を用いたMambaを提案する。 MDDCNetでは、Multi-Scale Deformable Dilated Convolution(MSDDC)ブロックとMambaブロックが連続して設計された、よく設計されたハイブリッドバックボーンが、局所的な詳細からグローバルなセマンティクスへの階層的な特徴表現を可能にしている。一方, チャネル強化フィードフォワードネットワーク (CE-FFN) は, 従来のフィードフォワードネットワークのチャネル間通信能力の限界を克服し, マルチスケール機能融合とインタラクションを実現するために, マンバベースのアテンション・アグリゲーション機能ピラミッドネットワーク (A^2FPN) を構築した。公開ベンチマークと実世界のデータセットによる大規模な実験結果から, 種々の先進検出器に対する本手法の優位性が確認された。コードはhttps://github.com/Bettermea/MDDCNetで公開されている。

論文の概要: Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection

関連論文リスト