Related papers: On the impressive performance of randomly weighted encoders in summarization tasks

On the impressive performance of randomly weighted encoders in summarization tasks

URL: http://arxiv.org/abs/2002.09084v1
Date: Fri, 21 Feb 2020 01:47:09 GMT
Title: On the impressive performance of randomly weighted encoders in summarization tasks
Authors: Jonathan Pilault, Jaehong Park, Christopher Pal
Abstract summary: We investigate the performance of untrained randomly encoders in a general class of sequence to sequence models. We compare their performance with that of fully-trained encoders on the task of abstractive summarization.
Score: 3.5407857489235206
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we investigate the performance of untrained randomly initialized encoders in a general class of sequence to sequence models and compare their performance with that of fully-trained encoders on the task of abstractive summarization. We hypothesize that random projections of an input text have enough representational power to encode the hierarchical structure of sentences and semantics of documents. Using a trained decoder to produce abstractive text summaries, we empirically demonstrate that architectures with untrained randomly initialized encoders perform competitively with respect to the equivalent architectures with fully-trained encoders. We further find that the capacity of the encoder not only improves overall model generalization but also closes the performance gap between untrained randomly initialized and full-trained encoders. To our knowledge, it is the first time that general sequence to sequence models with attention are assessed for trained and randomly projected representations on abstractive summarization.

Related papers

Extracting Text Representations for Terms and Phrases in Technical Domains [9.27244202193623]
We propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices. Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster.
arXiv Detail & Related papers (2023-05-25T08:59:36Z)
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models. To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z)
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding [3.185382039518151]
GenDoc is a sequence-to-sequence document understanding model pre-trained with unified masking across three modalities. The proposed model utilizes an encoder-decoder architecture, which allows for increased adaptability to a wide range of downstream tasks.
arXiv Detail & Related papers (2023-05-16T15:25:19Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization [15.367455931848252]
We present a sequence-to-sequence (seq2seq) autoencoder via contrastive learning for abstractive text summarization. Our model adopts a standard Transformer-based architecture with a multi-layer bi-directional encoder and an auto-regressive decoder. We conduct experiments on two datasets and demonstrate that our model outperforms many existing benchmarks.
arXiv Detail & Related papers (2021-08-26T18:45:13Z)
Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder [75.84152924972462]
Many real-world applications use Siamese networks to efficiently match text sequences at scale. This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
arXiv Detail & Related papers (2021-02-18T08:08:17Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.