Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion
- URL: http://arxiv.org/abs/2411.17995v1
- Date: Wed, 27 Nov 2024 02:24:51 GMT
- Title: Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion
- Authors: Taeheon Kim, Sangyun Chung, Youngjoon Yu, Yong Man Ro,
- Abstract summary: This paper introduces a new framework for multispectral pedestrian detection designed specifically to handle heavily misaligned datasets.
By leveraging Large-scale Vision-Language Models (LVLM) for cross-modal semantic alignment, our approach seeks to enhance detection accuracy.
- Score: 43.29589667431712
- License:
- Abstract: Multispectral pedestrian detection is a crucial component in various critical applications. However, a significant challenge arises due to the misalignment between these modalities, particularly under real-world conditions where data often appear heavily misaligned. Conventional methods developed on well-aligned or minimally misaligned datasets fail to address these discrepancies adequately. This paper introduces a new framework for multispectral pedestrian detection designed specifically to handle heavily misaligned datasets without the need for costly and complex traditional pre-processing calibration. By leveraging Large-scale Vision-Language Models (LVLM) for cross-modal semantic alignment, our approach seeks to enhance detection accuracy by aligning semantic information across the RGB and thermal domains. This method not only simplifies the operational requirements but also extends the practical usability of multispectral detection technologies in practical applications.
Related papers
- Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery [70.75963253876628]
Anomaly event detection plays a crucial role in various real-world applications.
We present a training-free framework that integrates open-set object detection with symbolic regression.
arXiv Detail & Related papers (2025-02-09T10:30:54Z) - What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration.
We identify the critical limitations of regression-based methods with the widely used data generation pipeline.
We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z) - Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks [49.84182981950623]
Multispectral object detection, utilizing RGB and TIR (thermal infrared) modalities, is widely recognized as a challenging task.
It requires not only the effective extraction of features from both modalities and robust fusion strategies, but also the ability to address issues such as spectral discrepancies.
We introduce an efficient and easily deployable multispectral object detection framework that can seamlessly optimize high-performing single-modality models.
arXiv Detail & Related papers (2024-11-27T12:18:39Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection [57.646582245834324]
We propose a simple yet effective deepfake detector called LSDA.
It is based on a idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary.
We show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.
arXiv Detail & Related papers (2023-11-19T09:41:10Z) - Comparing AutoML and Deep Learning Methods for Condition Monitoring
using Realistic Validation Scenarios [0.0]
This study extensively compares conventional machine learning methods and deep learning for condition monitoring tasks using an AutoML toolbox.
Experiments reveal consistent high accuracy in random K-fold cross-validation scenarios across all tested models.
No clear winner emerges, indicating the presence of domain shift in real-world scenarios.
arXiv Detail & Related papers (2023-08-28T14:57:29Z) - A Dimensional Structure based Knowledge Distillation Method for
Cross-Modal Learning [15.544134849816528]
We discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks.
We propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance.
The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy.
arXiv Detail & Related papers (2023-06-28T07:29:26Z) - Unsupervised Anomaly Detection via Nonlinear Manifold Learning [0.0]
Anomalies are samples that significantly deviate from the rest of the data and their detection plays a major role in building machine learning models.
We introduce a robust, efficient, and interpretable methodology based on nonlinear manifold learning to detect anomalies in unsupervised settings.
arXiv Detail & Related papers (2023-06-15T18:48:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.