Imagining Grounded Conceptual Representations from Perceptual
Information in Situated Guessing Games
- URL: http://arxiv.org/abs/2011.02917v1
- Date: Thu, 5 Nov 2020 15:42:29 GMT
- Title: Imagining Grounded Conceptual Representations from Perceptual
Information in Situated Guessing Games
- Authors: Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk,
Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon
- Abstract summary: In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle.
Existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time.
We introduce a novel "imagination" module based on Regularized Auto-Encoders, that learns context-aware and category-aware latent embeddings without relying on category labels at inference time.
- Score: 83.53942719040576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In visual guessing games, a Guesser has to identify a target object in a
scene by asking questions to an Oracle. An effective strategy for the players
is to learn conceptual representations of objects that are both discriminative
and expressive enough to ask questions and guess correctly. However, as shown
by Suglia et al. (2020), existing models fail to learn truly multi-modal
representations, relying instead on gold category labels for objects in the
scene both at training and inference time. This provides an unnatural
performance advantage when categories at inference time match those at training
time, and it causes models to fail in more realistic "zero-shot" scenarios
where out-of-domain object categories are involved. To overcome this issue, we
introduce a novel "imagination" module based on Regularized Auto-Encoders, that
learns context-aware and category-aware latent embeddings without relying on
category labels at inference time. Our imagination module outperforms
state-of-the-art competitors by 8.26% gameplay accuracy in the CompGuessWhat?!
zero-shot scenario (Suglia et al., 2020), and it improves the Oracle and
Guesser accuracy by 2.08% and 12.86% in the GuessWhat?! benchmark, when no gold
categories are available at inference time. The imagination module also boosts
reasoning about object properties and attributes.
Related papers
- Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Helping Hands: An Object-Aware Ego-Centric Video Recognition Model [60.350851196619296]
We introduce an object-aware decoder for improving the performance of ego-centric representations on ego-centric videos.
We show that the model can act as a drop-in replacement for an ego-awareness video model to improve performance through visual-text grounding.
arXiv Detail & Related papers (2023-08-15T17:58:11Z) - Beyond the Meta: Leveraging Game Design Parameters for Patch-Agnostic
Esport Analytics [4.1692797498685685]
Esport games comprise a sizeable fraction of the global games market, and is the fastest growing segment in games.
Compared to traditional sports, esport titles change rapidly, in terms of mechanics as well as rules.
This paper extracts information from game design (i.e. patch notes) and uses clustering techniques to propose a new form of character representation.
arXiv Detail & Related papers (2023-05-29T11:05:20Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - A Categorical Framework of General Intelligence [12.134564449202708]
Since Alan Turing asked this question in 1950, nobody is able to give a direct answer.
We introduce a categorical framework towards this goal, with two main results.
arXiv Detail & Related papers (2023-03-08T13:37:01Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.