On (Emergent) Systematic Generalisation and Compositionality in Visual
Referential Games with Straight-Through Gumbel-Softmax Estimator
- URL: http://arxiv.org/abs/2012.10776v1
- Date: Sat, 19 Dec 2020 20:40:09 GMT
- Title: On (Emergent) Systematic Generalisation and Compositionality in Visual
Referential Games with Straight-Through Gumbel-Softmax Estimator
- Authors: Kevin Denamgana\"i and James Alfred Walker
- Abstract summary: The drivers of compositionality emerge when two (or more) agents play a non-visual referential game.
This paper investigates what extent the drivers of compositionality identified so far in the field apply in the ST-GS context.
Using the ST-GS approach with small batch sizes and an overcomplete communication channel improves compositionality in the emerging languages.
- Score: 0.30458514384586394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The drivers of compositionality in artificial languages that emerge when two
(or more) agents play a non-visual referential game has been previously
investigated using approaches based on the REINFORCE algorithm and the (Neural)
Iterated Learning Model. Following the more recent introduction of the
\textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper
investigates to what extent the drivers of compositionality identified so far
in the field apply in the ST-GS context and to what extent do they translate
into (emergent) systematic generalisation abilities, when playing a visual
referential game. Compositionality and the generalisation abilities of the
emergent languages are assessed using topographic similarity and zero-shot
compositional tests. Firstly, we provide evidence that the test-train split
strategy significantly impacts the zero-shot compositional tests when dealing
with visual stimuli, whilst it does not when dealing with symbolic ones.
Secondly, empirical evidence shows that using the ST-GS approach with small
batch sizes and an overcomplete communication channel improves compositionality
in the emerging languages. Nevertheless, while shown robust with symbolic
stimuli, the effect of the batch size is not so clear-cut when dealing with
visual stimuli. Our results also show that not all overcomplete communication
channels are created equal. Indeed, while increasing the maximum sentence
length is found to be beneficial to further both compositionality and
generalisation abilities, increasing the vocabulary size is found detrimental.
Finally, a lack of correlation between the language compositionality at
training-time and the agents' generalisation abilities is observed in the
context of discriminative referential games with visual stimuli. This is
similar to previous observations in the field using the generative variant with
symbolic stimuli.
Related papers
- Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models [58.952782707682815]
COFT is a novel method to focus on different-level key texts, thereby avoiding getting lost in lengthy contexts.
Experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30%$ in the F1 score metric.
arXiv Detail & Related papers (2024-10-19T13:59:48Z) - The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication [1.3499500088995464]
We assess the representational alignment between agent image representations and agent representations and input images.
We identify a strong relationship between inter-agent alignment and topographic similarity, a common metric for compositionality.
Our findings emphasise the key role representational alignment plays in simulations of language emergence.
arXiv Detail & Related papers (2024-07-25T11:29:27Z) - Im-Promptu: In-Context Composition from Image Prompts [10.079743487034762]
We investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli.
We use Im-Promptu to train agents with different levels of compositionality, including vector representations, patch representations, and object slots.
Our experiments reveal tradeoffs between extrapolation abilities and the degree of compositionality, with non-compositional representations extending learned composition rules to unseen domains but performing poorly on tasks.
arXiv Detail & Related papers (2023-05-26T21:10:11Z) - Visual Referential Games Further the Emergence of Disentangled
Representations [0.12891210250935145]
This paper investigates how do compositionality at the level of emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games.
arXiv Detail & Related papers (2023-04-27T20:00:51Z) - Anticipating the Unseen Discrepancy for Vision and Language Navigation [63.399180481818405]
Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target.
The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well.
We propose Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.
arXiv Detail & Related papers (2022-09-10T19:04:40Z) - Transition-based Abstract Meaning Representation Parsing with Contextual
Embeddings [0.0]
We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing.
We explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of parsing.
arXiv Detail & Related papers (2022-06-13T15:05:24Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - GINet: Graph Interaction Network for Scene Parsing [58.394591509215005]
We propose a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss) to promote context reasoning over image regions.
The proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
arXiv Detail & Related papers (2020-09-14T02:52:45Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.