Emergent Communication with World Models
- URL: http://arxiv.org/abs/2002.09604v1
- Date: Sat, 22 Feb 2020 02:34:51 GMT
- Title: Emergent Communication with World Models
- Authors: Alexander I. Cowen-Rivers, Jason Naradowsky
- Abstract summary: We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages.
We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it.
We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks.
- Score: 80.55287578801008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Language World Models, a class of language-conditional
generative model which interpret natural language messages by predicting latent
codes of future observations. This provides a visual grounding of the message,
similar to an enhanced observation of the world, which may include objects
outside of the listening agent's field-of-view. We incorporate this
"observation" into a persistent memory state, and allow the listening agent's
policy to condition on it, akin to the relationship between memory and
controller in a World Model. We show this improves effective communication and
task success in 2D gridworld speaker-listener navigation tasks. In addition, we
develop two losses framed specifically for our model-based formulation to
promote positive signalling and positive listening. Finally, because messages
are interpreted in a generative model, we can visualize the model beliefs to
gain insight into how the communication channel is utilized.
Related papers
- EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions [152.41217651729738]
GPT-4o is an omni-modal model that enables vocal conversations with diverse emotions and tones.
We propose EMOVA to enable Large Language Models with end-to-end speech capabilities.
For the first time, EMOVA achieves state-of-the-art performance on both the vision-language and speech benchmarks.
arXiv Detail & Related papers (2024-09-26T16:44:02Z) - Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling [39.80957479349776]
We investigate the prosody modeling capabilities of the discrete space of an RVQ-VAE model, modifying it to operate on the phoneme-level.
We show that the phoneme-level discrete latent representations achieves a high degree of disentanglement, capturing fine-grained prosodic information that is robust and transferable.
arXiv Detail & Related papers (2024-09-13T09:27:05Z) - Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - LanGWM: Language Grounded World Model [24.86620763902546]
We focus on learning language-grounded visual features to enhance the world model learning.
Our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction.
arXiv Detail & Related papers (2023-11-29T12:41:55Z) - Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world.
Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future.
We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z) - MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation [62.44907105496227]
MindDial is a novel conversational framework that can generate situated free-form responses with theory-of-mind modeling.
We introduce an explicit mind module that can track the speaker's belief and the speaker's prediction of the listener's belief.
Our framework is applied to both prompting and fine-tuning-based models, and is evaluated across scenarios involving both common ground alignment and negotiation.
arXiv Detail & Related papers (2023-06-27T07:24:32Z) - PaLM-E: An Embodied Multimodal Language Model [101.29116156731762]
We propose embodied language models to incorporate real-world continuous sensor modalities into language models.
We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks.
Our largest model, PaLM-E-562B with 562B parameters, is a visual-language generalist with state-of-the-art performance on OK-VQA.
arXiv Detail & Related papers (2023-03-06T18:58:06Z) - Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations [22.894541507068933]
This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
arXiv Detail & Related papers (2022-03-08T12:26:57Z) - Multi-agent Communication meets Natural Language: Synergies between
Functional and Structural Language Learning [16.776753238108036]
We present a method for combining multi-agent communication and traditional data-driven approaches to natural language learning.
Our starting point is a language model that has been trained on generic, not task-specific language data.
We then place this model in a multi-agent self-play environment that generates task-specific rewards used to adapt or modulate the model.
arXiv Detail & Related papers (2020-05-14T15:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.