Interpretable agent communication from scratch(with a generic visual
processor emerging on the side)
- URL: http://arxiv.org/abs/2106.04258v1
- Date: Tue, 8 Jun 2021 11:32:11 GMT
- Title: Interpretable agent communication from scratch(with a generic visual
processor emerging on the side)
- Authors: Roberto Dess\`i, Eugene Kharitonov, Marco Baroni
- Abstract summary: We train two deep nets from scratch to perform realistic referent identification through unsupervised emergent communication.
We show that the largely interpretable emergent protocol allows the nets to successfully communicate even about object types they did not see at training time.
Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered.
- Score: 29.722833768572805
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As deep networks begin to be deployed as autonomous agents, the issue of how
they can communicate with each other becomes important. Here, we train two deep
nets from scratch to perform realistic referent identification through
unsupervised emergent communication. We show that the largely interpretable
emergent protocol allows the nets to successfully communicate even about object
types they did not see at training time. The visual representations induced as
a by-product of our training regime, moreover, show comparable quality, when
re-used as generic visual features, to a recent self-supervised learning model.
Our results provide concrete evidence of the viability of (interpretable)
emergent deep net communication in a more realistic scenario than previously
considered, as well as establishing an intriguing link between this field and
self-supervised visual learning.
Related papers
- Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Referential communication in heterogeneous communities of pre-trained
visual deep networks [11.807640148536077]
Large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots.
We show that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates.
We also study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.
arXiv Detail & Related papers (2023-02-04T15:55:23Z) - Less Data, More Knowledge: Building Next Generation Semantic
Communication Networks [180.82142885410238]
We present the first rigorous vision of a scalable end-to-end semantic communication network.
We first discuss how the design of semantic communication networks requires a move from data-driven networks towards knowledge-driven ones.
By using semantic representation and languages, we show that the traditional transmitter and receiver now become a teacher and apprentice.
arXiv Detail & Related papers (2022-11-25T19:03:25Z) - A Simple Long-Tailed Recognition Baseline via Vision-Language Model [92.2866546058082]
The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems.
Recent advances in contrastive visual-language pretraining shed light on a new pathway for visual recognition.
We propose BALLAD to leverage contrastive vision-language models for long-tailed recognition.
arXiv Detail & Related papers (2021-11-29T17:49:24Z) - Shared Visual Representations of Drawing for Communication: How do
different biases affect human interpretability and intent? [0.0]
We show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches.
We develop an approach to help automatically analyse the semantic content being conveyed by a sketch.
arXiv Detail & Related papers (2021-10-15T17:02:34Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Interpretation of Emergent Communication in Heterogeneous Collaborative
Embodied Agents [83.52684405389445]
We introduce the collaborative multi-object navigation task CoMON.
In this task, an oracle agent has detailed environment information in the form of a map.
It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals.
We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment.
arXiv Detail & Related papers (2021-10-12T06:56:11Z) - Learning to Draw: Emergent Communication through Sketching [0.0]
We show how agents can learn to communicate in order to collaboratively solve tasks.
Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents.
Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training.
In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret.
arXiv Detail & Related papers (2021-06-03T18:17:55Z) - The emergence of visual semantics through communication games [0.0]
Communication systems which capture visual semantics can be learned in a completely self-supervised manner by playing the right types of game.
Our work bridges a gap between emergent communication research and self-supervised feature learning.
arXiv Detail & Related papers (2021-01-25T17:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.