Self-supervised object detection from audio-visual correspondence
- URL: http://arxiv.org/abs/2104.06401v1
- Date: Tue, 13 Apr 2021 17:59:03 GMT
- Title: Self-supervised object detection from audio-visual correspondence
- Authors: Triantafyllos Afouras, Yuki M. Asano, Francois Fagan, Andrea Vedaldi,
Florian Metze
- Abstract summary: We tackle the problem of learning object detectors without supervision.
We do not assume image-level class labels, instead we extract a supervisory signal from audio-visual data.
We show that our method can learn to detect generic objects that go beyond instruments, such as airplanes and cats.
- Score: 101.46794879729453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of learning object detectors without supervision.
Differently from weakly-supervised object detection, we do not assume
image-level class labels. Instead, we extract a supervisory signal from
audio-visual data, using the audio component to "teach" the object detector.
While this problem is related to sound source localisation, it is considerably
harder because the detector must classify the objects by type, enumerate each
instance of the object, and do so even when the object is silent. We tackle
this problem by first designing a self-supervised framework with a contrastive
objective that jointly learns to classify and localise objects. Then, without
using any supervision, we simply use these self-supervised labels and boxes to
train an image-based object detector. With this, we outperform previous
unsupervised and weakly-supervised detectors for the task of object detection
and sound source localization. We also show that we can align this detector to
ground-truth classes with as little as one label per pseudo-class, and show how
our method can learn to detect generic objects that go beyond instruments, such
as airplanes and cats.
Related papers
- SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues.
Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects.
We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z) - SalienDet: A Saliency-based Feature Enhancement Algorithm for Object
Detection for Autonomous Driving [160.57870373052577]
We propose a saliency-based OD algorithm (SalienDet) to detect unknown objects.
Our SalienDet utilizes a saliency-based algorithm to enhance image features for object proposal generation.
We design a dataset relabeling approach to differentiate the unknown objects from all objects in training sample set to achieve Open-World Detection.
arXiv Detail & Related papers (2023-05-11T16:19:44Z) - Unsupervised Object Localization: Observing the Background to Discover
Objects [4.870509580034194]
In this work, we take a different approach and propose to look for the background instead.
This way, the salient objects emerge as a by-product without any strong assumption on what an object should be.
We propose FOUND, a simple model made of a single $conv1times1$ with coarse background masks extracted from self-supervised patch-based representations.
arXiv Detail & Related papers (2022-12-15T13:43:11Z) - Detect Only What You Specify : Object Detection with Linguistic Target [0.0]
We propose Language-Targeted Detector (LTD) for the targeted detection based on a recently proposed Transformer-based detector.
LTD is a encoder-decoder architecture and our conditional decoder allows the model to reason about the encoded image with the textual input as the linguistic context.
arXiv Detail & Related papers (2022-11-18T07:28:47Z) - The Sound of Bounding-Boxes [12.019518891110007]
We propose a fully unsupervised method that learns to detect objects in an image and separate sound source simultaneously.
Although being fully unsupervised, our method performs comparably in separation accuracy.
arXiv Detail & Related papers (2022-03-30T01:58:52Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - Context-Aware Transfer Attacks for Object Detection [51.65308857232767]
We present a new approach to generate context-aware attacks for object detectors.
We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks.
arXiv Detail & Related papers (2021-12-06T18:26:39Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Bridging the Gap Between Object Detection and User Intent via
Query-Modulation [33.967176965675264]
query-modulated detectors show superior performance at detecting objects for a given label of interest.
They can be simultaneously trained to solve for both query-modulated detection and standard object detection.
arXiv Detail & Related papers (2021-06-18T17:47:53Z) - Cross-Supervised Object Detection [42.783400918552765]
We show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories.
We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations.
arXiv Detail & Related papers (2020-06-26T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.