MDDFNet: Mamba-based Dynamic Dual Fusion Network for Traffic Sign Detection
- URL: http://arxiv.org/abs/2505.05491v1
- Date: Fri, 02 May 2025 14:53:25 GMT
- Title: MDDFNet: Mamba-based Dynamic Dual Fusion Network for Traffic Sign Detection
- Authors: TianYi Yu,
- Abstract summary: We propose a novel object detection network, Mamba-based Dynamic Dual Fusion Network (MDDFNet) for traffic sign detection.<n>The network integrates a dynamic dual fusion module and a Mamba-based backbone to simultaneously tackle the aforementioned issues.<n>Extensive experiments conducted on the TT100K (Tsinghua-Tencent 100K) datasets demonstrate that MDDFNet outperforms other state-of-the-art detectors.
- Score: 0.081585306387285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Detection of small objects, especially traffic signs, is a critical sub-task in object detection and autonomous driving. Despite signficant progress in previous research, two main challenges remain. First, the issue of feature extraction being too singular. Second, the detection process struggles to efectively handle objects of varying sizes or scales. These problems are also prevalent in general object detection tasks. To address these challenges, we propose a novel object detection network, Mamba-based Dynamic Dual Fusion Network (MDDFNet), for traffic sign detection. The network integrates a dynamic dual fusion module and a Mamba-based backbone to simultaneously tackle the aforementioned issues. Specifically, the dynamic dual fusion module utilizes multiple branches to consolidate various spatial and semantic information, thus enhancing feature diversity. The Mamba-based backbone leverages global feature fusion and local feature interaction, combining features in an adaptive manner to generate unique classification characteristics. Extensive experiments conducted on the TT100K (Tsinghua-Tencent 100K) datasets demonstrate that MDDFNet outperforms other state-of-the-art detectors, maintaining real-time processing capabilities of single-stage models while achieving superior performance. This confirms the efectiveness of MDDFNet in detecting small traffic signs.
Related papers
- UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection [5.89878250955707]
UAVD-Mamba is a multimodal UAV object detection framework based on Mamba architectures.<n>To improve geometric adaptability, we propose the Deformable Token Mamba Block (DTMB)<n>To optimize the multimodal feature complementarity, we design two separate DTMBs for the RGB and infrared (IR) modalities.
arXiv Detail & Related papers (2025-07-01T15:21:27Z) - DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models [11.34839442803445]
We propose a multi-class collaborative detection and tracking framework tailored for diverse road users.<n>We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes.<n>Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors.
arXiv Detail & Related papers (2025-06-09T02:49:10Z) - SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.<n>It is designed to accurately detect horizontal or oriented objects from any sensor modality.<n>This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z) - COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection [9.913133285133998]
Single-modal object detection tasks often experience performance degradation when encountering diverse scenarios.<n> multimodal object detection tasks can offer more comprehensive information about object features by integrating data from various modalities.<n>In this paper, we propose a novel approach called the CrOss-Mamba interaction and Offset-guided fusion framework.
arXiv Detail & Related papers (2024-12-24T01:14:48Z) - EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection [11.525603303355268]
The detection of small objects, particularly traffic signs, is a critical subtask within object detection and autonomous driving.
Motivated by these challenges, we propose a novel object detection network named Efficient Multi-scale and Diverse Feature Network (EMDFNet)
EMDFNet integrates an Augmented Shortcut Module and an Efficient Hybrid to address the aforementioned issues simultaneously.
arXiv Detail & Related papers (2024-08-26T11:26:27Z) - DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing [8.530409994516619]
Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies.
We propose Disparity-guided Multispectral Mamba (DMM), a framework comprised of a Disparity-guided Cross-modal Fusion Mamba (DCFM) module, a Multi-scale Target-aware Attention (MTA) module, and a Target-Prior Aware (TPA) auxiliary task.
arXiv Detail & Related papers (2024-07-11T02:09:59Z) - AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - Rethinking the Detection Head Configuration for Traffic Object Detection [11.526701794026641]
We propose a lightweight traffic object detection network based on matching between detection head and object distribution.
The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
arXiv Detail & Related papers (2022-10-08T02:23:57Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.