The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
- URL: http://arxiv.org/abs/2407.17960v1
- Date: Thu, 25 Jul 2024 11:29:27 GMT
- Title: The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
- Authors: Tom Kouwenhoven, Max Peeperkorn, Bram van Dijk, Tessa Verhoef,
- Abstract summary: We assess the representational alignment between agent image representations and agent representations and input images.
We identify a strong relationship between inter-agent alignment and topographic similarity, a common metric for compositionality.
Our findings emphasise the key role representational alignment plays in simulations of language emergence.
- Score: 1.3499500088995464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language has the universal properties of being compositional and grounded in reality. The emergence of linguistic properties is often investigated through simulations of emergent communication in referential games. However, these experiments have yielded mixed results compared to similar experiments addressing linguistic properties of human language. Here we address representational alignment as a potential contributing factor to these results. Specifically, we assess the representational alignment between agent image representations and between agent representations and input images. Doing so, we confirm that the emergent language does not appear to encode human-like conceptual visual features, since agent image representations drift away from inputs whilst inter-agent alignment increases. We moreover identify a strong relationship between inter-agent alignment and topographic similarity, a common metric for compositionality, and address its consequences. To address these issues, we introduce an alignment penalty that prevents representational drift but interestingly does not improve performance on a compositional discrimination task. Together, our findings emphasise the key role representational alignment plays in simulations of language emergence.
Related papers
- Exploring Anisotropy and Outliers in Multilingual Language Models for
Cross-Lingual Semantic Sentence Similarity [64.18762301574954]
Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings.
This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context.
We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models.
arXiv Detail & Related papers (2023-06-01T09:01:48Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Learning Multi-Object Positional Relationships via Emergent
Communication [16.26264889682904]
We train agents in a referential game where observations contain two objects, and find that generalization is the major problem when the positional relationship is involved.
We find that the learned language can generalize well in a new multi-step MDP task where the positional relationship describes the goal, and performs better than raw-pixel images as well as pre-trained image features.
We also show that language transfer from the referential game performs better in the new task than learning language directly in this task, implying the potential benefits of pre-training in referential games.
arXiv Detail & Related papers (2023-02-16T04:44:53Z) - Learning to Improve Representations by Communicating About Perspectives [0.0]
We present aminimal architecture comprised of a population of autoencoders.
We show that our proposed architectureallows the emergence of aligned representations.
Results demonstrate how communication from subjective perspec-tives can lead to the acquisition of more abstract representations in multi-agent systems.
arXiv Detail & Related papers (2021-09-20T09:30:13Z) - On (Emergent) Systematic Generalisation and Compositionality in Visual
Referential Games with Straight-Through Gumbel-Softmax Estimator [0.30458514384586394]
The drivers of compositionality emerge when two (or more) agents play a non-visual referential game.
This paper investigates what extent the drivers of compositionality identified so far in the field apply in the ST-GS context.
Using the ST-GS approach with small batch sizes and an overcomplete communication channel improves compositionality in the emerging languages.
arXiv Detail & Related papers (2020-12-19T20:40:09Z) - The Geometry of Distributed Representations for Better Alignment,
Attenuated Bias, and Improved Interpretability [9.215513608145994]
High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in machine learning and data mining.
These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping.
Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces.
This work addresses some of these problems pertaining to the transparency and interpretability of such representations.
arXiv Detail & Related papers (2020-11-25T01:04:11Z) - Consensus-Aware Visual-Semantic Embedding for Image-Text Matching [69.34076386926984]
Image-text matching plays a central role in bridging vision and language.
Most existing approaches only rely on the image-text instance pair to learn their representations.
We propose a Consensus-aware Visual-Semantic Embedding model to incorporate the consensus information.
arXiv Detail & Related papers (2020-07-17T10:22:57Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.