On the impressive performance of randomly weighted encoders in
summarization tasks
- URL: http://arxiv.org/abs/2002.09084v1
- Date: Fri, 21 Feb 2020 01:47:09 GMT
- Title: On the impressive performance of randomly weighted encoders in
summarization tasks
- Authors: Jonathan Pilault, Jaehong Park, Christopher Pal
- Abstract summary: We investigate the performance of untrained randomly encoders in a general class of sequence to sequence models.
We compare their performance with that of fully-trained encoders on the task of abstractive summarization.
- Score: 3.5407857489235206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate the performance of untrained randomly
initialized encoders in a general class of sequence to sequence models and
compare their performance with that of fully-trained encoders on the task of
abstractive summarization. We hypothesize that random projections of an input
text have enough representational power to encode the hierarchical structure of
sentences and semantics of documents. Using a trained decoder to produce
abstractive text summaries, we empirically demonstrate that architectures with
untrained randomly initialized encoders perform competitively with respect to
the equivalent architectures with fully-trained encoders. We further find that
the capacity of the encoder not only improves overall model generalization but
also closes the performance gap between untrained randomly initialized and
full-trained encoders. To our knowledge, it is the first time that general
sequence to sequence models with attention are assessed for trained and
randomly projected representations on abstractive summarization.
Related papers
- Extracting Text Representations for Terms and Phrases in Technical
Domains [9.27244202193623]
We propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices.
Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster.
arXiv Detail & Related papers (2023-05-25T08:59:36Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - Sequence-to-Sequence Pre-training with Unified Modality Masking for
Visual Document Understanding [3.185382039518151]
GenDoc is a sequence-to-sequence document understanding model pre-trained with unified masking across three modalities.
The proposed model utilizes an encoder-decoder architecture, which allows for increased adaptability to a wide range of downstream tasks.
arXiv Detail & Related papers (2023-05-16T15:25:19Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive
Text Summarization [15.367455931848252]
We present a sequence-to-sequence (seq2seq) autoencoder via contrastive learning for abstractive text summarization.
Our model adopts a standard Transformer-based architecture with a multi-layer bi-directional encoder and an auto-regressive decoder.
We conduct experiments on two datasets and demonstrate that our model outperforms many existing benchmarks.
arXiv Detail & Related papers (2021-08-26T18:45:13Z) - Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder [75.84152924972462]
Many real-world applications use Siamese networks to efficiently match text sequences at scale.
This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
arXiv Detail & Related papers (2021-02-18T08:08:17Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.