Syntax-Guided Transformers: Elevating Compositional Generalization and
Grounding in Multimodal Environments
- URL: http://arxiv.org/abs/2311.04364v1
- Date: Tue, 7 Nov 2023 21:59:16 GMT
- Title: Syntax-Guided Transformers: Elevating Compositional Generalization and
Grounding in Multimodal Environments
- Authors: Danial Kamali and Parisa Kordjamshidi
- Abstract summary: We exploit the syntactic structure of language to boost compositional generalization.
We introduce and evaluate the merits of using syntactic information in the multimodal grounding problem.
The results push the state-of-the-art in multimodal grounding and parameter-efficient modeling.
- Score: 20.70294450587676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compositional generalization, the ability of intelligent models to
extrapolate understanding of components to novel compositions, is a fundamental
yet challenging facet in AI research, especially within multimodal
environments. In this work, we address this challenge by exploiting the
syntactic structure of language to boost compositional generalization. This
paper elevates the importance of syntactic grounding, particularly through
attention masking techniques derived from text input parsing. We introduce and
evaluate the merits of using syntactic information in the multimodal grounding
problem. Our results on grounded compositional generalization underscore the
positive impact of dependency parsing across diverse tasks when utilized with
Weight Sharing across the Transformer encoder. The results push the
state-of-the-art in multimodal grounding and parameter-efficient modeling and
provide insights for future research.
Related papers
- Analysis of the Evolution of Advanced Transformer-Based Language Models:
Experiments on Opinion Mining [0.5735035463793008]
This paper studies the behaviour of the cutting-edge Transformer-based language models on opinion mining.
Our comparative study shows leads and paves the way for production engineers regarding the approach to focus on.
arXiv Detail & Related papers (2023-08-07T01:10:50Z) - On Evaluating Multilingual Compositional Generalization with Translated
Datasets [34.51457321680049]
We show that compositional generalization abilities differ across languages.
We craft a faithful rule-based translation of the MCWQ dataset from English to Chinese and Japanese.
Even with the resulting robust benchmark, which we call MCWQ-R, we show that the distribution of compositions still suffers due to linguistic divergences.
arXiv Detail & Related papers (2023-06-20T10:03:57Z) - DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations.
We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Transition-based Abstract Meaning Representation Parsing with Contextual
Embeddings [0.0]
We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing.
We explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of parsing.
arXiv Detail & Related papers (2022-06-13T15:05:24Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
We investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization.
arXiv Detail & Related papers (2020-10-12T12:34:58Z) - MGD-GAN: Text-to-Pedestrian generation through Multi-Grained
Discrimination [96.91091607251526]
We propose the Multi-Grained Discrimination enhanced Generative Adversarial Network, that capitalizes a human-part-based Discriminator and a self-cross-attended Discriminator.
A fine-grained word-level attention mechanism is employed in the HPD module to enforce diversified appearance and vivid details.
The substantial improvement over the various metrics demonstrates the efficacy of MGD-GAN on the text-to-pedestrian synthesis scenario.
arXiv Detail & Related papers (2020-10-02T12:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.