Learning to Compose Representations of Different Encoder Layers towards
Improving Compositional Generalization
- URL: http://arxiv.org/abs/2305.12169v2
- Date: Wed, 18 Oct 2023 14:19:40 GMT
- Title: Learning to Compose Representations of Different Encoder Layers towards
Improving Compositional Generalization
- Authors: Lei Lin, Shuangtao Li, Yafang Zheng, Biao Fu, Shan Liu, Yidong Chen,
Xiaodong Shi
- Abstract summary: We propose textscCompoSition (textbfCompose textbfSyntactic and Semanttextbfic Representatextbftions)
textscCompoSition achieves competitive results on two comprehensive and realistic benchmarks.
- Score: 29.32436551704417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that sequence-to-sequence (seq2seq) models struggle
with compositional generalization (CG), i.e., the ability to systematically
generalize to unseen compositions of seen components. There is mounting
evidence that one of the reasons hindering CG is the representation of the
encoder uppermost layer is entangled, i.e., the syntactic and semantic
representations of sequences are entangled. However, we consider that the
previously identified representation entanglement problem is not comprehensive
enough. Additionally, we hypothesize that the source keys and values
representations passing into different decoder layers are also entangled.
Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se
\textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an
extension to seq2seq models which learns to compose representations of
different encoder layers dynamically for different tasks, since recent studies
reveal that the bottom layers of the Transformer encoder contain more syntactic
information and the top ones contain more semantic information. Specifically,
we introduce a \textit{composed layer} between the encoder and decoder to
compose different encoder layers' representations to generate specific keys and
values passing into different decoder layers. \textsc{CompoSition} achieves
competitive results on two comprehensive and realistic benchmarks, which
empirically demonstrates the effectiveness of our proposal. Codes are available
at~\url{https://github.com/thinkaboutzero/COMPOSITION}.
Related papers
- Layer-wise Representation Fusion for Compositional Generalization [26.771056871444692]
A key reason for failure on compositional generalization is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled.
We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers.
Inspired by this, we propose LRF, a novel textbfLayer-wise textbfRepresentation textbfFusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process.
arXiv Detail & Related papers (2023-07-20T12:01:40Z) - Transforming Visual Scene Graphs to Image Captions [69.13204024990672]
We propose to transform Scene Graphs (TSG) into more descriptive captions.
In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.
In TSG, each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words.
arXiv Detail & Related papers (2023-05-03T15:18:37Z) - GypSum: Learning Hybrid Representations for Code Summarization [21.701127410434914]
GypSum is a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model.
We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation.
arXiv Detail & Related papers (2022-04-26T07:44:49Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Layer-Wise Multi-View Learning for Neural Machine Translation [45.679212203943194]
Traditional neural machine translation is limited to the topmost encoder layer's context representation.
We propose layer-wise multi-view learning to solve this problem.
Our approach yields stable improvements over multiple strong baselines.
arXiv Detail & Related papers (2020-11-03T05:06:37Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Consistent Multiple Sequence Decoding [36.46573114422263]
We introduce a consistent multiple sequence decoding architecture.
This architecture allows for consistent and simultaneous decoding of an arbitrary number of sequences.
We show the efficacy of our consistent multiple sequence decoder on the task of dense relational image captioning.
arXiv Detail & Related papers (2020-04-02T00:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.