Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding
- URL: http://arxiv.org/abs/2005.08081v7
- Date: Mon, 29 Aug 2022 06:40:23 GMT
- Title: Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding
- Authors: Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Chenyu You, Xuewei Ma,
Xian Wu, Xu Sun
- Abstract summary: In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
- Score: 59.48857453699463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In sequence-to-sequence learning, e.g., natural language generation, the
decoder relies on the attention mechanism to efficiently extract information
from the encoder. While it is common practice to draw information from only the
last encoder layer, recent work has proposed to use representations from
different encoder layers for diversified levels of information. Nonetheless,
the decoder still obtains only a single view of the source sequences, which
might lead to insufficient training of the encoder layer stack due to the
hierarchy bypassing problem. In this work, we propose layer-wise multi-view
decoding, where for each decoder layer, together with the representations from
the last encoder layer, which serve as a global view, those from other encoder
layers are supplemented for a stereoscopic view of the source sequences.
Systematic experiments and analyses show that we successfully address the
hierarchy bypassing problem, require almost negligible parameter increase, and
substantially improve the performance of sequence-to-sequence learning with
deep representations on five diverse tasks, i.e., machine translation,
abstractive summarization, image captioning, video captioning, medical report
generation, and paraphrase generation. In particular, our approach achieves new
state-of-the-art results on ten benchmark datasets, including a low-resource
machine translation dataset and two low-resource medical report generation
datasets.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Investigating Pre-trained Audio Encoders in the Low-Resource Condition [66.92823764664206]
We conduct a comprehensive set of experiments using a representative set of 3 state-of-the-art encoders (Wav2vec2, WavLM, Whisper) in the low-resource setting.
We provide various quantitative and qualitative analyses on task performance, convergence speed, and representational properties of the encoders.
arXiv Detail & Related papers (2023-05-28T14:15:19Z) - Learning to Compose Representations of Different Encoder Layers towards
Improving Compositional Generalization [29.32436551704417]
We propose textscCompoSition (textbfCompose textbfSyntactic and Semanttextbfic Representatextbftions)
textscCompoSition achieves competitive results on two comprehensive and realistic benchmarks.
arXiv Detail & Related papers (2023-05-20T11:16:59Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Exploring and Exploiting Multi-Granularity Representations for Machine
Reading Comprehension [13.191437539419681]
We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net)
ABA-Net adaptively exploits the source representations of different levels to the predictor.
We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
arXiv Detail & Related papers (2022-08-18T10:14:32Z) - Layer-Wise Multi-View Learning for Neural Machine Translation [45.679212203943194]
Traditional neural machine translation is limited to the topmost encoder layer's context representation.
We propose layer-wise multi-view learning to solve this problem.
Our approach yields stable improvements over multiple strong baselines.
arXiv Detail & Related papers (2020-11-03T05:06:37Z) - On the Sub-Layer Functionalities of Transformer Decoder [74.83087937309266]
We study how Transformer-based decoders leverage information from the source and target languages.
Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance.
arXiv Detail & Related papers (2020-10-06T11:50:54Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.