Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning
- URL: http://arxiv.org/abs/2008.02880v1
- Date: Thu, 6 Aug 2020 21:33:44 GMT
- Title: Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning
- Authors: Yannick Le Cacheux, Adrian Popescu, Herv\'e Le Borgne
- Abstract summary: Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset.
We focus on the problem of semantic class prototype design for large scale ZSL.
We investigate the use of noisy textual metadata associated to photos as text collections.
- Score: 8.472636806304273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning (ZSL) makes object recognition in images possible in
absence of visual training data for a part of the classes from a dataset. When
the number of classes is large, classes are usually represented by semantic
class prototypes learned automatically from unannotated text collections. This
typically leads to much lower performances than with manually designed semantic
prototypes such as attributes. While most ZSL works focus on the visual aspect
and reuse standard semantic prototypes learned from generic text collections,
we focus on the problem of semantic class prototype design for large scale ZSL.
More specifically, we investigate the use of noisy textual metadata associated
to photos as text collections, as we hypothesize they are likely to provide
more plausible semantic embeddings for visual classes if exploited
appropriately. We thus make use of a source-based voting strategy to improve
the robustness of semantic prototypes. Evaluation on the large scale ImageNet
dataset shows a significant improvement in ZSL performances over two strong
baselines, and over usual semantic embeddings used in previous works. We show
that this improvement is obtained for several embedding methods, leading to
state of the art results when one uses automatically created visual and text
features.
Related papers
- Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - I2DFormer: Learning Image to Document Attention for Zero-Shot Image
Classification [123.90912800376039]
Online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes.
We propose I2DFormer, a novel transformer-based ZSL framework that jointly learns to encode images and documents.
Our method leads to highly interpretable results where document words can be grounded in the image regions.
arXiv Detail & Related papers (2022-09-21T12:18:31Z) - Learning Semantic Ambiguities for Zero-Shot Learning [0.0]
We propose a regularization method that can be applied to any conditional generative-based ZSL method.
It learns to synthesize discriminative features for possible semantic description that are not available at training time, that is the unseen ones.
The approach is evaluated for ZSL and GZSL on four datasets commonly used in the literature.
arXiv Detail & Related papers (2022-01-05T21:08:29Z) - Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning [48.583388368897126]
Few-shot learning is the task of learning to recognize previously unseen categories of images.
We propose a method that takes into account the names of the image classes.
arXiv Detail & Related papers (2021-05-21T08:08:28Z) - Rich Semantics Improve Few-shot Learning [49.11659525563236]
We show that by using 'class-level' language descriptions, that can be acquired with minimal annotation cost, we can improve the few-shot learning performance.
We develop a Transformer based forward and backward encoding mechanism to relate visual and semantic tokens.
arXiv Detail & Related papers (2021-04-26T16:48:27Z) - Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual
Descriptions [5.3556221126231085]
We study the impact of using rich and diverse textual descriptions of classes for zero-shot learning (ZSL) on ImageNet.
We create a new dataset ImageNet-Wiki that matches each ImageNet class to its corresponding Wikipedia article.
We show that employing these Wikipedia articles as class descriptions yields much higher ZSL performance than prior works.
arXiv Detail & Related papers (2021-03-17T14:06:56Z) - Zero-shot Learning with Deep Neural Networks for Object Recognition [8.572654816871873]
Zero-shot learning deals with the ability to recognize objects without any visual training sample.
This chapter presents a review of the approaches based on deep neural networks to tackle the ZSL problem.
arXiv Detail & Related papers (2021-02-05T12:27:42Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.