Zero-shot Learning with Deep Neural Networks for Object Recognition
- URL: http://arxiv.org/abs/2102.03137v1
- Date: Fri, 5 Feb 2021 12:27:42 GMT
- Title: Zero-shot Learning with Deep Neural Networks for Object Recognition
- Authors: Yannick Le Cacheux and Herv\'e Le Borgne and Michel Crucianu
- Abstract summary: Zero-shot learning deals with the ability to recognize objects without any visual training sample.
This chapter presents a review of the approaches based on deep neural networks to tackle the ZSL problem.
- Score: 8.572654816871873
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Zero-shot learning deals with the ability to recognize objects without any
visual training sample. To counterbalance this lack of visual data, each class
to recognize is associated with a semantic prototype that reflects the
essential features of the object. The general approach is to learn a mapping
from visual data to semantic prototypes, then use it at inference to classify
visual samples from the class prototypes only. Different settings of this
general configuration can be considered depending on the use case of interest,
in particular whether one only wants to classify objects that have not been
employed to learn the mapping or whether one can use unlabelled visual examples
to learn the mapping. This chapter presents a review of the approaches based on
deep neural networks to tackle the ZSL problem. We highlight findings that had
a large impact on the evolution of this domain and list its current challenges.
Related papers
- Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization [40.5076868823241]
We introduce a new dataset of adjacent image triplets obtained from a viewpoint trajectory.
We benchmark both semantic classification and pose estimation accuracies on the same visual feature.
Our experiments demonstrate that this approach helps develop a visual representation that encodes object identity.
arXiv Detail & Related papers (2024-03-22T06:04:11Z) - Learning Dense Object Descriptors from Multiple Views for Low-shot
Category Generalization [27.583517870047487]
We propose Deep Object Patch rimis (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels.
To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object.
We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines.
arXiv Detail & Related papers (2022-11-28T04:31:53Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Learning Semantic Ambiguities for Zero-Shot Learning [0.0]
We propose a regularization method that can be applied to any conditional generative-based ZSL method.
It learns to synthesize discriminative features for possible semantic description that are not available at training time, that is the unseen ones.
The approach is evaluated for ZSL and GZSL on four datasets commonly used in the literature.
arXiv Detail & Related papers (2022-01-05T21:08:29Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot
Learning [85.2093650907943]
We propose SEmantic Guided Attention (SEGA) to teach machines to recognize a new category.
SEGA uses semantic knowledge to guide the visual perception in a top-down manner about what visual features should be paid attention to.
We show that our semantic guided attention realizes anticipated function and outperforms state-of-the-art results.
arXiv Detail & Related papers (2021-11-08T08:03:44Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.