Learning Dense Object Descriptors from Multiple Views for Low-shot
Category Generalization
- URL: http://arxiv.org/abs/2211.15059v1
- Date: Mon, 28 Nov 2022 04:31:53 GMT
- Title: Learning Dense Object Descriptors from Multiple Views for Low-shot
Category Generalization
- Authors: Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg
- Abstract summary: We propose Deep Object Patch rimis (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels.
To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object.
We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines.
- Score: 27.583517870047487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A hallmark of the deep learning era for computer vision is the successful use
of large-scale labeled datasets to train feature representations for tasks
ranging from object recognition and semantic segmentation to optical flow
estimation and novel view synthesis of 3D scenes. In this work, we aim to learn
dense discriminative object representations for low-shot category recognition
without requiring any category labels. To this end, we propose Deep Object
Patch Encodings (DOPE), which can be trained from multiple views of object
instances without any category or semantic object part labels. To train DOPE,
we assume access to sparse depths, foreground masks and known cameras, to
obtain pixel-level correspondences between views of an object, and use this to
formulate a self-supervised learning task to learn discriminative object
patches. We find that DOPE can directly be used for low-shot classification of
novel categories using local-part matching, and is competitive with and
outperforms supervised and self-supervised learning baselines. Code and data
available at https://github.com/rehg-lab/dope_selfsup.
Related papers
- In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation [50.79940712523551]
We present lazy visual grounding, a two-stage approach of unsupervised object mask discovery followed by object grounding.
Our model requires no additional training yet shows great performance on five public datasets.
arXiv Detail & Related papers (2024-08-09T09:28:35Z) - CLIPose: Category-Level Object Pose Estimation with Pre-trained
Vision-Language Knowledge [18.57081150228812]
We propose a novel 6D pose framework that employs the pre-trained vision-language model to develop better learning of object category information.
CLIPose achieves state-of-the-art performance on two mainstream benchmark datasets, REAL275 and CAMERA25, and runs in real-time during inference (40FPS)
arXiv Detail & Related papers (2024-02-24T05:31:53Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor
Environments [67.34330257205525]
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner.
We present a method that uses annotated objects to learn the objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
arXiv Detail & Related papers (2022-12-22T17:59:48Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Cross-Supervised Object Detection [42.783400918552765]
We show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories.
We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations.
arXiv Detail & Related papers (2020-06-26T15:33:48Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.