FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification
- URL: http://arxiv.org/abs/2503.09873v1
- Date: Wed, 12 Mar 2025 22:12:35 GMT
- Title: FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification
- Authors: Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao,
- Abstract summary: We decompose, align, and fuse multiple image sensor data for target classification.<n>We propose a shared discrete token (UDT) space between sensors to reduce the domain and granularity gaps.<n>We achieve superior classification performance compared to single-modality classifiers.
- Score: 10.878168590232852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In automatic target recognition (ATR) systems, sensors may fail to capture discriminative, fine-grained detail features due to environmental conditions, noise created by CMOS chips, occlusion, parallaxes, and sensor misalignment. Therefore, multi-sensor image fusion is an effective choice to overcome these constraints. However, multi-modal image sensors are heterogeneous and have domain and granularity gaps. In addition, the multi-sensor images can be misaligned due to intricate background clutters, fluctuating illumination conditions, and uncontrolled sensor settings. In this paper, to overcome these issues, we decompose, align, and fuse multiple image sensor data for target classification. We extract the domain-specific and domain-invariant features from each sensor data. We propose to develop a shared unified discrete token (UDT) space between sensors to reduce the domain and granularity gaps. Additionally, we develop an alignment module to overcome the misalignment between multi-sensors and emphasize the discriminative representation of the UDT space. In the alignment module, we introduce sparsity constraints to provide a better cross-modal representation of the UDT space and robustness against various sensor settings. We achieve superior classification performance compared to single-modality classifiers and several state-of-the-art multi-modal fusion algorithms on four multi-sensor ATR datasets.
Related papers
- MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models.
We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z) - Adaptive Domain Learning for Cross-domain Image Denoising [57.4030317607274]
We present a novel adaptive domain learning scheme for cross-domain image denoising.
We use existing data from different sensors (source domain) plus a small amount of data from the new sensor (target domain)
The ADL training scheme automatically removes the data in the source domain that are harmful to fine-tuning a model for the target domain.
Also, we introduce a modulation module to adopt sensor-specific information (sensor type and ISO) to understand input data for image denoising.
arXiv Detail & Related papers (2024-11-03T08:08:26Z) - CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining [1.4528189330418977]
SenPa-MAE encodes the sensor parameters of an observed multispectral signal into the image embeddings.
SenPa-MAE can be pre-trained on imagery of different satellites with non-matching spectral or geometrical sensor characteristics.
arXiv Detail & Related papers (2024-08-20T16:53:30Z) - Bridging Remote Sensors with Multisensor Geospatial Foundation Models [15.289711240431107]
msGFM is a multisensor geospatial foundation model that unifies data from four key sensor modalities.
For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach.
msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks.
arXiv Detail & Related papers (2024-04-01T17:30:56Z) - LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z) - Log-Likelihood Score Level Fusion for Improved Cross-Sensor Smartphone
Periocular Recognition [52.15994166413364]
We employ fusion of several comparators to improve periocular performance when images from different smartphones are compared.
We use a probabilistic fusion framework based on linear logistic regression, in which fused scores tend to be log-likelihood ratios.
Our framework also provides an elegant and simple solution to handle signals from different devices, since same-sensor and cross-sensor score distributions are aligned and mapped to a common probabilistic domain.
arXiv Detail & Related papers (2023-11-02T13:43:44Z) - Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain.
Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z) - HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object
Detection [0.0]
We propose HRFuser, a modular architecture for multi-modal 2D object detection.
It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities.
We demonstrate via experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities.
arXiv Detail & Related papers (2022-06-30T09:40:05Z) - Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image
Super-Resolution with Subpixel Fusion [67.35540259040806]
We propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DCNet.
As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components.
We append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product.
arXiv Detail & Related papers (2022-05-07T23:40:36Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.