Learning Multi-Object Positional Relationships via Emergent
Communication
- URL: http://arxiv.org/abs/2302.08084v1
- Date: Thu, 16 Feb 2023 04:44:53 GMT
- Title: Learning Multi-Object Positional Relationships via Emergent
Communication
- Authors: Yicheng Feng, Boshi An, and Zongqing Lu
- Abstract summary: We train agents in a referential game where observations contain two objects, and find that generalization is the major problem when the positional relationship is involved.
We find that the learned language can generalize well in a new multi-step MDP task where the positional relationship describes the goal, and performs better than raw-pixel images as well as pre-trained image features.
We also show that language transfer from the referential game performs better in the new task than learning language directly in this task, implying the potential benefits of pre-training in referential games.
- Score: 16.26264889682904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of emergent communication has been dedicated to interactive
artificial intelligence. While existing work focuses on communication about
single objects or complex image scenes, we argue that communicating
relationships between multiple objects is important in more realistic tasks,
but understudied. In this paper, we try to fill this gap and focus on emergent
communication about positional relationships between two objects. We train
agents in the referential game where observations contain two objects, and find
that generalization is the major problem when the positional relationship is
involved. The key factor affecting the generalization ability of the emergent
language is the input variation between Speaker and Listener, which is realized
by a random image generator in our work. Further, we find that the learned
language can generalize well in a new multi-step MDP task where the positional
relationship describes the goal, and performs better than raw-pixel images as
well as pre-trained image features, verifying the strong generalization ability
of discrete sequences. We also show that language transfer from the referential
game performs better in the new task than learning language directly in this
task, implying the potential benefits of pre-training in referential games. All
in all, our experiments demonstrate the viability and merit of having agents
learn to communicate positional relationships between multiple objects through
emergent communication.
Related papers
- Learning Multi-Agent Communication with Contrastive Learning [3.816854668079928]
We introduce an alternative perspective where communicative messages are considered as different incomplete views of the environment state.
By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning.
In communication-essential environments, our method outperforms previous work in both performance and learning speed.
arXiv Detail & Related papers (2023-07-03T23:51:05Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Position-Aware Contrastive Alignment for Referring Image Segmentation [65.16214741785633]
We present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features.
Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment.
arXiv Detail & Related papers (2022-12-27T09:13:19Z) - Leveraging Visual Knowledge in Language Tasks: An Empirical Study on
Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
We study whether integrating visual knowledge into a language model can fill the gap.
Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.
arXiv Detail & Related papers (2022-03-14T22:02:40Z) - Interpretation of Emergent Communication in Heterogeneous Collaborative
Embodied Agents [83.52684405389445]
We introduce the collaborative multi-object navigation task CoMON.
In this task, an oracle agent has detailed environment information in the form of a map.
It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals.
We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment.
arXiv Detail & Related papers (2021-10-12T06:56:11Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z) - The emergence of visual semantics through communication games [0.0]
Communication systems which capture visual semantics can be learned in a completely self-supervised manner by playing the right types of game.
Our work bridges a gap between emergent communication research and self-supervised feature learning.
arXiv Detail & Related papers (2021-01-25T17:43:37Z) - Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc.
We introduce the task of inferring implicit and explicit spatial relations between two entities in an image.
We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z) - Internal and external pressures on language emergence: least effort,
object constancy and frequency [27.731900533634516]
In previous work, artificial agents were shown to achieve almost perfect accuracy in referential games where they have to communicate to identify images.
We propose some realistic sources of pressure on communication that avert this outcome.
Our findings reveal that the proposed sources of pressure result in emerging languages with less redundancy, more focus on high-level conceptual information, and better abilities of generalisation.
arXiv Detail & Related papers (2020-04-08T08:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.