Layer-wise Representation Fusion for Compositional Generalization
- URL: http://arxiv.org/abs/2307.10799v2
- Date: Thu, 21 Dec 2023 07:38:56 GMT
- Title: Layer-wise Representation Fusion for Compositional Generalization
- Authors: Yafang Zheng, Lei Lin, Shuangtao Li, Yuxuan Yuan, Zhaohong Lai, Shan
Liu, Biao Fu, Yidong Chen, Xiaodong Shi
- Abstract summary: A key reason for failure on compositional generalization is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled.
We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers.
Inspired by this, we propose LRF, a novel textbfLayer-wise textbfRepresentation textbfFusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process.
- Score: 26.771056871444692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing neural models are demonstrated to struggle with compositional
generalization (CG), i.e., the ability to systematically generalize to unseen
compositions of seen components. A key reason for failure on CG is that the
syntactic and semantic representations of sequences in both the uppermost layer
of the encoder and decoder are entangled. However, previous work concentrates
on separating the learning of syntax and semantics instead of exploring the
reasons behind the representation entanglement (RE) problem to solve it. We
explain why it exists by analyzing the representation evolving mechanism from
the bottom to the top of the Transformer layers. We find that the ``shallow''
residual connections within each layer fail to fuse previous layers'
information effectively, leading to information forgetting between layers and
further the RE problems. Inspired by this, we propose LRF, a novel
\textbf{L}ayer-wise \textbf{R}epresentation \textbf{F}usion framework for CG,
which learns to fuse previous layers' information back into the encoding and
decoding process effectively through introducing a \emph{fuse-attention module}
at each encoder and decoder layer. LRF achieves promising results on two
realistic benchmarks, empirically demonstrating the effectiveness of our
proposal.
Related papers
- UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Single Image Reflection Separation via Component Synergy [14.57590565534889]
The reflection superposition phenomenon is complex and widely distributed in the real world.
We propose a more general form of the superposition model by introducing a learnable residue term.
In order to fully capitalize on its advantages, we further design the network structure elaborately.
arXiv Detail & Related papers (2023-08-19T14:25:27Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Learning to Compose Representations of Different Encoder Layers towards
Improving Compositional Generalization [29.32436551704417]
We propose textscCompoSition (textbfCompose textbfSyntactic and Semanttextbfic Representatextbftions)
textscCompoSition achieves competitive results on two comprehensive and realistic benchmarks.
arXiv Detail & Related papers (2023-05-20T11:16:59Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z) - A Hierarchical Coding Scheme for Glasses-free 3D Displays Based on
Scalable Hybrid Layered Representation of Real-World Light Fields [0.6091702876917279]
Scheme learns stacked multiplicative layers from subsets of light field views determined from different scanning orders.
The spatial correlation in layer patterns is exploited with varying low ranks in factorization derived from singular value decomposition on a Krylov subspace.
encoding with HEVC efficiently removes intra-view and inter-view correlation in low-rank approximated layers.
arXiv Detail & Related papers (2021-04-19T15:09:21Z) - Layer-Wise Multi-View Learning for Neural Machine Translation [45.679212203943194]
Traditional neural machine translation is limited to the topmost encoder layer's context representation.
We propose layer-wise multi-view learning to solve this problem.
Our approach yields stable improvements over multiple strong baselines.
arXiv Detail & Related papers (2020-11-03T05:06:37Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.