Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive
Text Summarization
- URL: http://arxiv.org/abs/2108.11992v1
- Date: Thu, 26 Aug 2021 18:45:13 GMT
- Title: Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive
Text Summarization
- Authors: Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, Ling Fan, Zhe Wang
- Abstract summary: We present a sequence-to-sequence (seq2seq) autoencoder via contrastive learning for abstractive text summarization.
Our model adopts a standard Transformer-based architecture with a multi-layer bi-directional encoder and an auto-regressive decoder.
We conduct experiments on two datasets and demonstrate that our model outperforms many existing benchmarks.
- Score: 15.367455931848252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a denoising sequence-to-sequence (seq2seq)
autoencoder via contrastive learning for abstractive text summarization. Our
model adopts a standard Transformer-based architecture with a multi-layer
bi-directional encoder and an auto-regressive decoder. To enhance its denoising
ability, we incorporate self-supervised contrastive learning along with various
sentence-level document augmentation. These two components, seq2seq autoencoder
and contrastive learning, are jointly trained through fine-tuning, which
improves the performance of text summarization with regard to ROUGE scores and
human evaluation. We conduct experiments on two datasets and demonstrate that
our model outperforms many existing benchmarks and even achieves comparable
performance to the state-of-the-art abstractive systems trained with more
complex architecture and extensive computation resources.
Related papers
- Sequence Shortening for Context-Aware Machine Translation [5.803309695504831]
We show that a special case of multi-encoder architecture achieves higher accuracy on contrastive datasets.
We introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context.
arXiv Detail & Related papers (2024-02-02T13:55:37Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [64.61596752343837]
We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units.
We enhance the model performance by subword prediction in the first-pass decoder.
We show that the proposed methods boost the performance even when predicting spectrogram in the second pass.
arXiv Detail & Related papers (2022-12-15T18:58:28Z) - E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models.
We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2.
E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - On Adversarial Robustness of Synthetic Code Generation [1.2559148369195197]
This paper showcases the existence of significant dataset bias through different classes of adversarial examples.
We propose several dataset augmentation techniques to reduce bias and showcase their efficacy.
arXiv Detail & Related papers (2021-06-22T09:37:48Z) - Sequence-to-Sequence Contrastive Learning for Text Recognition [29.576864819760498]
We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations.
Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods.
arXiv Detail & Related papers (2020-12-20T09:07:41Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - EASTER: Efficient and Scalable Text Recognizer [0.0]
We present an Efficient And Scalable TExt Recognizer (EASTER) to perform optical character recognition on both machine printed and handwritten text.
Our model utilise 1-D convolutional layers without any recurrence which enables parallel training with considerably less volume of data.
We also showcase improvements over the current best results on offline handwritten text recognition task.
arXiv Detail & Related papers (2020-08-18T10:26:03Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.