FSMODNet: A Closer Look at Few-Shot Detection in Multispectral Data
- URL: http://arxiv.org/abs/2509.20905v1
- Date: Thu, 25 Sep 2025 08:45:05 GMT
- Title: FSMODNet: A Closer Look at Few-Shot Detection in Multispectral Data
- Authors: Manuel Nkegoum, Minh-Tan Pham, Élisa Fromont, Bruno Avignon, Sébastien Lefèvre,
- Abstract summary: Few-shot multispectral object detection (FSMOD) addresses the challenge of detecting objects across visible and thermal modalities with minimal data.<n>We introduce a framework named "FSMODNet" that leverages cross-modality feature integration to improve detection performance even with limited labels.
- Score: 7.459632891054827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot multispectral object detection (FSMOD) addresses the challenge of detecting objects across visible and thermal modalities with minimal annotated data. In this paper, we explore this complex task and introduce a framework named "FSMODNet" that leverages cross-modality feature integration to improve detection performance even with limited labels. By effectively combining the unique strengths of visible and thermal imagery using deformable attention, the proposed method demonstrates robust adaptability in complex illumination and environmental conditions. Experimental results on two public datasets show effective object detection performance in challenging low-data regimes, outperforming several baselines we established from state-of-the-art models. All code, models, and experimental data splits can be found at https://anonymous.4open.science/r/Test-B48D.
Related papers
- Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition [71.5328300638085]
Zero-shot Human-object interaction (HOI) detection aims to locate humans and objects in images and recognize their interactions.<n>Existing methods, including two-stage methods, tightly couple interaction recognition with a specific detector.<n>We propose a decoupled framework that separates object detection from IR and leverages multi-modal large language models (MLLMs) for zero-shot IR.
arXiv Detail & Related papers (2026-02-16T19:01:31Z) - MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images [26.48439423478357]
We introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA)<n>This dataset comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios.<n>We also propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues.
arXiv Detail & Related papers (2025-12-10T10:07:06Z) - HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection [2.482043295743758]
HiddenObject is a fusion framework that integrates RGB, thermal, and depth data using a Mamba-based fusion mechanism.<n>Our method captures complementary signals across modalities, enabling enhanced detection of obscured or camouflaged targets.
arXiv Detail & Related papers (2025-08-28T18:09:22Z) - CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z) - WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer [4.768265044725289]
Water surface object detection faces challenges from blurred edges and diverse object scales.<n>Existing approaches suffer from cross-modal feature conflicts, which negatively affect model robustness.<n>We propose a robust vision-radar fusion model WS-DETR, which achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-04-10T04:16:46Z) - Efficient Multimodal 3D Object Detector via Instance-Level Contrastive Distillation [17.634678949648208]
We introduce a fast yet effective multimodal 3D object detector, incorporating our proposed Instance-level Contrastive Distillation (ICD) framework and Cross Linear Attention Fusion Module (CLFM)<n>Our 3D object detector outperforms state-of-the-art (SOTA) methods while achieving superior efficiency.
arXiv Detail & Related papers (2025-03-17T08:26:11Z) - Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment [59.61554561979589]
Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios.<n>Existing edge detection methods face challenges: difficulty balancing detection precision with lightweight models, limited adaptability, and insufficient real-world validation.<n>We propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments.
arXiv Detail & Related papers (2024-12-24T07:28:10Z) - PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks [49.84182981950623]
Multispectral object detection, utilizing RGB and TIR (thermal infrared) modalities, is widely recognized as a challenging task.<n>It requires not only the effective extraction of features from both modalities and robust fusion strategies, but also the ability to address issues such as spectral discrepancies.<n>We introduce an efficient and easily deployable multispectral object detection framework that can seamlessly optimize high-performing single-modality models.
arXiv Detail & Related papers (2024-11-27T12:18:39Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.<n>Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.<n>To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine
Blocks [3.6488662460683794]
We propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features.
We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection.
arXiv Detail & Related papers (2020-09-26T18:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.