The emergence of visual semantics through communication games
- URL: http://arxiv.org/abs/2101.10253v1
- Date: Mon, 25 Jan 2021 17:43:37 GMT
- Title: The emergence of visual semantics through communication games
- Authors: Daniela Mihai and Jonathon Hare
- Abstract summary: Communication systems which capture visual semantics can be learned in a completely self-supervised manner by playing the right types of game.
Our work bridges a gap between emergent communication research and self-supervised feature learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of communication systems between agents which learn to play
referential signalling games with realistic images has attracted a lot of
attention recently. The majority of work has focused on using fixed, pretrained
image feature extraction networks which potentially bias the information the
agents learn to communicate. In this work, we consider a signalling game
setting in which a `sender' agent must communicate the information about an
image to a `receiver' who must select the correct image from many distractors.
We investigate the effect of the feature extractor's weights and of the task
being solved on the visual semantics learned by the models. We first
demonstrate to what extent the use of pretrained feature extraction networks
inductively bias the visual semantics conveyed by emergent communication
channel and quantify the visual semantics that are induced.
We then go on to explore ways in which inductive biases can be introduced to
encourage the emergence of semantically meaningful communication without the
need for any form of supervised pretraining of the visual feature extractor. We
impose various augmentations to the input images and additional tasks in the
game with the aim to induce visual representations which capture conceptual
properties of images. Through our experiments, we demonstrate that
communication systems which capture visual semantics can be learned in a
completely self-supervised manner by playing the right types of game. Our work
bridges a gap between emergent communication research and self-supervised
feature learning.
Related papers
- Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - Learning Multi-Object Positional Relationships via Emergent
Communication [16.26264889682904]
We train agents in a referential game where observations contain two objects, and find that generalization is the major problem when the positional relationship is involved.
We find that the learned language can generalize well in a new multi-step MDP task where the positional relationship describes the goal, and performs better than raw-pixel images as well as pre-trained image features.
We also show that language transfer from the referential game performs better in the new task than learning language directly in this task, implying the potential benefits of pre-training in referential games.
arXiv Detail & Related papers (2023-02-16T04:44:53Z) - Semantic-Aware Fine-Grained Correspondence [8.29030327276322]
We propose to learn semantic-aware fine-grained correspondence using image-level self-supervised methods.
We design a pixel-level self-supervised learning objective which specifically targets fine-grained correspondence.
Our method surpasses previous state-of-the-art self-supervised methods using convolutional networks on a variety of visual correspondence tasks.
arXiv Detail & Related papers (2022-07-21T12:51:41Z) - Leveraging Visual Knowledge in Language Tasks: An Empirical Study on
Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
We study whether integrating visual knowledge into a language model can fill the gap.
Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.
arXiv Detail & Related papers (2022-03-14T22:02:40Z) - Shared Visual Representations of Drawing for Communication: How do
different biases affect human interpretability and intent? [0.0]
We show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches.
We develop an approach to help automatically analyse the semantic content being conveyed by a sketch.
arXiv Detail & Related papers (2021-10-15T17:02:34Z) - Interpretable agent communication from scratch(with a generic visual
processor emerging on the side) [29.722833768572805]
We train two deep nets from scratch to perform realistic referent identification through unsupervised emergent communication.
We show that the largely interpretable emergent protocol allows the nets to successfully communicate even about object types they did not see at training time.
Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered.
arXiv Detail & Related papers (2021-06-08T11:32:11Z) - Learning to Draw: Emergent Communication through Sketching [0.0]
We show how agents can learn to communicate in order to collaboratively solve tasks.
Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents.
Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training.
In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret.
arXiv Detail & Related papers (2021-06-03T18:17:55Z) - Exploring Visual Engagement Signals for Representation Learning [56.962033268934015]
We present VisE, a weakly supervised learning approach, which maps social images to pseudo labels derived by clustered engagement signals.
We then study how models trained in this way benefit subjective downstream computer vision tasks such as emotion recognition or political bias detection.
arXiv Detail & Related papers (2021-04-15T20:50:40Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z) - Visually Guided Self Supervised Learning of Speech Representations [62.23736312957182]
We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech.
We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment.
We achieve state of the art results for emotion recognition and competitive results for speech recognition.
arXiv Detail & Related papers (2020-01-13T14:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.