Related papers: Synthesizing the Unseen for Zero-shot Object Detection

Synthesizing the Unseen for Zero-shot Object Detection

URL: http://arxiv.org/abs/2010.09425v1
Date: Mon, 19 Oct 2020 12:36:11 GMT
Title: Synthesizing the Unseen for Zero-shot Object Detection
Authors: Nasir Hayat, Munawar Hayat, Shafin Rahman, Salman Khan, Syed Waqas Zamir, Fahad Shahbaz Khan
Abstract summary: We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
Score: 72.38031440014463
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference. However, since the unseen objects are never visualized during training, the detection model is skewed towards seen content, thereby labeling unseen as background or a seen class. In this work, we propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. Consequently, the major challenge becomes, how to accurately synthesize unseen objects merely using their class semantics? Towards this ambitious goal, we propose a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them. Further, using a unified model, we ensure the synthesized features have high diversity that represents the intra-class differences and variable localization precision in the detected bounding boxes. We test our approach on three object detection benchmarks, PASCAL VOC, MSCOCO, and ILSVRC detection, under both conventional and generalized settings, showing impressive gains over the state-of-the-art methods. Our codes are available at https://github.com/nasir6/zero_shot_detection.

Related papers

Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer) We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments. We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Resolving Semantic Confusions for Improved Zero-Shot Detection [6.72910827751713]
We propose a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes. A cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics.
arXiv Detail & Related papers (2022-12-12T18:11:48Z)
Open World DETR: Transformer based Open World Object Detection [60.64535309016623]
We propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR. We fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint. Our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
arXiv Detail & Related papers (2022-12-06T13:39:30Z)
Exploiting Unlabeled Data with Vision and Language Models for Object Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z)
Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition. Prior works use visual features extracted with a backbone network, pre-trained for object classification. We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z)
Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component. It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z)
Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs. We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images. Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism. We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.