Robots Autonomously Detecting People: A Multimodal Deep Contrastive
Learning Method Robust to Intraclass Variations
- URL: http://arxiv.org/abs/2203.00187v2
- Date: Tue, 13 Feb 2024 20:07:59 GMT
- Title: Robots Autonomously Detecting People: A Multimodal Deep Contrastive
Learning Method Robust to Intraclass Variations
- Authors: Angus Fung, Beno Benhabib, Goldie Nejat
- Abstract summary: We present a novel multimodal person detection architecture to address the mobile robot problem of person detection under intraclass variations.
We present a two-stage training approach using 1) a unique pretraining method we define as Temporal Invariant Multimodal Contrastive Learning (TimCLR), and 2) a Multimodal Faster R-CNN (MFRCNN) detector.
- Score: 6.798578739481274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic detection of people in crowded and/or cluttered human-centered
environments including hospitals, long-term care, stores and airports is
challenging as people can become occluded by other people or objects, and
deform due to variations in clothing or pose. There can also be loss of
discriminative visual features due to poor lighting. In this paper, we present
a novel multimodal person detection architecture to address the mobile robot
problem of person detection under intraclass variations. We present a two-stage
training approach using 1) a unique pretraining method we define as Temporal
Invariant Multimodal Contrastive Learning (TimCLR), and 2) a Multimodal Faster
R-CNN (MFRCNN) detector. TimCLR learns person representations that are
invariant under intraclass variations through unsupervised learning. Our
approach is unique in that it generates image pairs from natural variations
within multimodal image sequences, in addition to synthetic data augmentation,
and contrasts crossmodal features to transfer invariances between different
modalities. These pretrained features are used by the MFRCNN detector for
finetuning and person detection from RGB-D images. Extensive experiments
validate the performance of our DL architecture in both human-centered crowded
and cluttered environments. Results show that our method outperforms existing
unimodal and multimodal person detection approaches in terms of detection
accuracy in detecting people with body occlusions and pose deformations in
different lighting conditions.
Related papers
- StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - LDTrack: Dynamic People Tracking by Service Robots using Diffusion Models [6.049096929667388]
This paper introduces a novel deep learning architecture, using conditional latent diffusion models, the Latent Diffusion Track (LDTrack) for tracking multiple dynamic people under intraclass variations.
Extensive experiments demonstrate the effectiveness of LDTrack over other state-of-the-art tracking methods in cluttered and crowded human-centered environments under intraclass variations.
arXiv Detail & Related papers (2024-02-13T20:16:31Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - DCdetector: Dual Attention Contrastive Representation Learning for Time
Series Anomaly Detection [26.042898544127503]
Time series anomaly detection is critical for a wide range of applications.
It aims to identify deviant samples from the normal sample distribution in time series.
We propose DCdetector, a multi-scale dual attention contrastive representation learning model.
arXiv Detail & Related papers (2023-06-17T13:40:15Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification
Using Model Ensembles [52.77024349608834]
We analyze the influence of replacing a DCNN with a state-of-the-art face recognition approach, iResNet with ArcFace.
Our proposed ensemble model achieves state-of-the-art performance on both seen and unseen disorders.
arXiv Detail & Related papers (2022-11-12T23:28:54Z) - Margin-Aware Intra-Class Novelty Identification for Medical Images [2.647674705784439]
We propose a hybrid model - Transformation-based Embedding learning for Novelty Detection (TEND)
With a pre-trained autoencoder as image feature extractor, TEND learns to discriminate the feature embeddings of in-distribution data from the transformed counterparts as fake out-of-distribution inputs.
arXiv Detail & Related papers (2021-07-31T00:10:26Z) - Multi-Modal Anomaly Detection for Unstructured and Uncertain
Environments [5.677685109155077]
Modern robots require the ability to detect and recover from anomalies and failures with minimal human supervision.
We propose a deep learning neural network: supervised variational autoencoder (SVAE), for failure identification in unstructured and uncertain environments.
Our experiments on real field robot data demonstrate superior failure identification performance than baseline methods, and that our model learns interpretable representations.
arXiv Detail & Related papers (2020-12-15T21:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.