CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning
- URL: http://arxiv.org/abs/2006.02174v1
- Date: Wed, 3 Jun 2020 11:21:42 GMT
- Title: CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning
- Authors: Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele
Bastianelli, Desmond Elliott, Stella Frank and Oliver Lemon
- Abstract summary: We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
- Score: 78.3857991931479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approaches to Grounded Language Learning typically focus on a single
task-based final performance measure that may not depend on desirable
properties of the learned hidden representations, such as their ability to
predict salient attributes or to generalise to unseen situations. To remedy
this, we present GROLLA, an evaluation framework for Grounded Language Learning
with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object
attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a
new dataset CompGuessWhat?! as an instance of this framework for evaluating the
quality of learned neural representations, in particular concerning attribute
grounding. To this end, we extend the original GuessWhat?! dataset by including
a semantic layer on top of the perceptual one. Specifically, we enrich the
VisualGenome scene graphs associated with the GuessWhat?! images with abstract
and situated attributes. By using diagnostic classifiers, we show that current
models learn representations that are not expressive enough to encode object
attributes (average F1 of 44.27). In addition, they do not learn strategies nor
representations that are robust enough to perform well when novel scenes or
objects are involved in gameplay (zero-shot best accuracy 50.06%).
Related papers
- Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition.
Prior works use visual features extracted with a backbone network, pre-trained for object classification.
We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Towards Zero-shot Sign Language Recognition [11.952300437658703]
This paper tackles the problem of zero-shot sign language recognition.
The goal is to leverage models learned over the seen sign classes to recognize the instances of unseen sign classes.
arXiv Detail & Related papers (2022-01-15T19:26:36Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z) - Meta-Learning with Context-Agnostic Initialisations [86.47040878540139]
We introduce a context-adversarial component into the meta-learning process.
This produces an initialisation for fine-tuning to target which is context-agnostic and task-generalised.
We evaluate our approach on three commonly used meta-learning algorithms and two problems.
arXiv Detail & Related papers (2020-07-29T08:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.