Describe me an Aucklet: Generating Grounded Perceptual Category
Descriptions
- URL: http://arxiv.org/abs/2303.04053v3
- Date: Thu, 26 Oct 2023 11:35:03 GMT
- Title: Describe me an Aucklet: Generating Grounded Perceptual Category
Descriptions
- Authors: Bill Noble, Nikolai Ilinykh
- Abstract summary: We introduce a framework for testing category-level perceptual grounding in multi-modal language models.
We train separate neural networks to generate and interpret descriptions of visual categories.
We show that communicative success exposes performance issues in the generation model.
- Score: 2.7195102129095003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human speakers can generate descriptions of perceptual concepts, abstracted
from the instance-level. Moreover, such descriptions can be used by other
speakers to learn provisional representations of those concepts. Learning and
using abstract perceptual concepts is under-investigated in the
language-and-vision field. The problem is also highly relevant to the field of
representation learning in multi-modal NLP. In this paper, we introduce a
framework for testing category-level perceptual grounding in multi-modal
language models. In particular, we train separate neural networks to generate
and interpret descriptions of visual categories. We measure the communicative
success of the two models with the zero-shot classification performance of the
interpretation model, which we argue is an indicator of perceptual grounding.
Using this framework, we compare the performance of prototype- and
exemplar-based representations. Finally, we show that communicative success
exposes performance issues in the generation model, not captured by traditional
intrinsic NLG evaluation metrics, and argue that these issues stem from a
failure to properly ground language in vision at the category level.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.