Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
- URL: http://arxiv.org/abs/2201.12771v1
- Date: Sun, 30 Jan 2022 09:52:14 GMT
- Title: Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
- Authors: Jannik Z\"urn, Wolfram Burgard
- Abstract summary: We propose a self-supervised approach that leverages audio-visual cues to detect moving vehicles in videos.
Our approach employs contrastive learning for localizing vehicles in images from corresponding pairs of images and recorded audio.
We show that our model can be used as a teacher to supervise an audio-only detection model.
- Score: 29.06503735149157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust detection of moving vehicles is a critical task for any autonomously
operating outdoor robot or self-driving vehicle. Most modern approaches for
solving this task rely on training image-based detectors using large-scale
vehicle detection datasets such as nuScenes or the Waymo Open Dataset.
Providing manual annotations is an expensive and laborious exercise that does
not scale well in practice. To tackle this problem, we propose a
self-supervised approach that leverages audio-visual cues to detect moving
vehicles in videos. Our approach employs contrastive learning for localizing
vehicles in images from corresponding pairs of images and recorded audio. In
extensive experiments carried out with a real-world dataset, we demonstrate
that our approach provides accurate detections of moving vehicles and does not
require manual annotations. We furthermore show that our model can be used as a
teacher to supervise an audio-only detection model. This student model is
invariant to illumination changes and thus effectively bridges the domain gap
inherent to models leveraging exclusively vision as the predominant modality.
Related papers
- Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning [9.178588671620963]
This work aims to recognise the latent unobservable object characteristics.
vision is commonly used for object recognition by robots, but it is ineffective for detecting hidden objects.
We propose a cross-modal transfer learning approach from vision to haptic-audio.
arXiv Detail & Related papers (2024-03-15T21:18:14Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Linking vision and motion for self-supervised object-centric perception [16.821130222597155]
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features.
Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization.
We adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs.
arXiv Detail & Related papers (2023-07-14T04:21:05Z) - Self-Supervised Pretraining on Satellite Imagery: a Case Study on
Label-Efficient Vehicle Detection [0.0]
We study in-domain self-supervised representation learning for object detection on very high resolution optical satellite imagery.
We use the large land use classification dataset Functional Map of the World to pretrain representations with an extension of the Momentum Contrast framework.
We then investigate this model's transferability on a real-world task of fine-grained vehicle detection and classification on Preligens proprietary data.
arXiv Detail & Related papers (2022-10-21T08:41:22Z) - Self-Supervised Steering Angle Prediction for Vehicle Control Using
Visual Odometry [55.11913183006984]
We show how a model can be trained to control a vehicle's trajectory using camera poses estimated through visual odometry methods.
We propose a scalable framework that leverages trajectory information from several different runs using a camera setup placed at the front of a car.
arXiv Detail & Related papers (2021-03-20T16:29:01Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Self-Supervised Learning of Audio-Visual Objects from Video [108.77341357556668]
We introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time.
We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks.
arXiv Detail & Related papers (2020-08-10T16:18:01Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.