Related papers: Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

URL: http://arxiv.org/abs/2305.12169v2
Date: Wed, 18 Oct 2023 14:19:40 GMT
Title: Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Authors: Lei Lin, Shuangtao Li, Yafang Zheng, Biao Fu, Shan Liu, Yidong Chen, Xiaodong Shi
Abstract summary: We propose textscCompoSition (textbfCompose textbfSyntactic and Semanttextbfic Representatextbftions) textscCompoSition achieves competitive results on two comprehensive and realistic benchmarks.
Score: 29.32436551704417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different encoder layers dynamically for different tasks, since recent studies reveal that the bottom layers of the Transformer encoder contain more syntactic information and the top ones contain more semantic information. Specifically, we introduce a \textit{composed layer} between the encoder and decoder to compose different encoder layers' representations to generate specific keys and values passing into different decoder layers. \textsc{CompoSition} achieves competitive results on two comprehensive and realistic benchmarks, which empirically demonstrates the effectiveness of our proposal. Codes are available at~\url{https://github.com/thinkaboutzero/COMPOSITION}.

Related papers

Layer-wise Representation Fusion for Compositional Generalization [26.771056871444692]
A key reason for failure on compositional generalization is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. Inspired by this, we propose LRF, a novel textbfLayer-wise textbfRepresentation textbfFusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process.
arXiv Detail & Related papers (2023-07-20T12:01:40Z)
Transforming Visual Scene Graphs to Image Captions [69.13204024990672]
We propose to transform Scene Graphs (TSG) into more descriptive captions. In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs. In TSG, each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words.
arXiv Detail & Related papers (2023-05-03T15:18:37Z)
GypSum: Learning Hybrid Representations for Code Summarization [21.701127410434914]
GypSum is a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation.
arXiv Detail & Related papers (2022-04-26T07:44:49Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language. We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings. We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR) SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
Layer-Wise Multi-View Learning for Neural Machine Translation [45.679212203943194]
Traditional neural machine translation is limited to the topmost encoder layer's context representation. We propose layer-wise multi-view learning to solve this problem. Our approach yields stable improvements over multiple strong baselines.
arXiv Detail & Related papers (2020-11-03T05:06:37Z)
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder. Recent work has proposed to use representations from different encoder layers for diversified levels of information. We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
Consistent Multiple Sequence Decoding [36.46573114422263]
We introduce a consistent multiple sequence decoding architecture. This architecture allows for consistent and simultaneous decoding of an arbitrary number of sequences. We show the efficacy of our consistent multiple sequence decoder on the task of dense relational image captioning.
arXiv Detail & Related papers (2020-04-02T00:43:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.