Efficient Unsupervised Sentence Compression by Fine-tuning Transformers
with Reinforcement Learning
- URL: http://arxiv.org/abs/2205.08221v1
- Date: Tue, 17 May 2022 10:34:28 GMT
- Title: Efficient Unsupervised Sentence Compression by Fine-tuning Transformers
with Reinforcement Learning
- Authors: Demian Gholipour Ghalandari, Chris Hokamp, Georgiana Ifrim
- Abstract summary: Sentence compression reduces the length of text by removing non-essential content.
Unsupervised objective driven methods for sentence compression can be used to create customized models.
- Score: 10.380414189465347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentence compression reduces the length of text by removing non-essential
content while preserving important facts and grammaticality. Unsupervised
objective driven methods for sentence compression can be used to create
customized models without the need for ground-truth training data, while
allowing flexibility in the objective function(s) that are used for learning
and inference. Recent unsupervised sentence compression approaches use custom
objectives to guide discrete search; however, guided search is expensive at
inference time. In this work, we explore the use of reinforcement learning to
train effective sentence compression models that are also fast when generating
predictions. In particular, we cast the task as binary sequence labelling and
fine-tune a pre-trained transformer using a simple policy gradient approach.
Our approach outperforms other unsupervised models while also being more
efficient at inference time.
Related papers
- Once-for-All Sequence Compression for Self-Supervised Speech Models [62.60723685118747]
We introduce a once-for-all sequence compression framework for self-supervised speech models.
The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants.
We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
arXiv Detail & Related papers (2022-11-04T09:19:13Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs [16.968490007064872]
Applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning.
We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space.
Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence performance and the information captured in the latent space.
arXiv Detail & Related papers (2022-09-26T11:21:19Z) - Learning Non-Autoregressive Models from Search for Unsupervised Sentence
Summarization [20.87460375478907]
Text summarization aims to generate a short summary for an input text.
In this work, we propose a Non-Autoregressive Unsupervised Summarization approach.
Experiments show that NAUS achieves state-of-the-art performance for unsupervised summarization.
arXiv Detail & Related papers (2022-05-28T21:09:23Z) - Accordion: Adaptive Gradient Communication via Critical Learning Regime
Identification [12.517161466778655]
Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes.
To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, or low-rank updates.
In this work, we show that such performance degradation due to choosing a high compression ratio is not fundamental.
An adaptive compression strategy can reduce communication while maintaining final test accuracy.
arXiv Detail & Related papers (2020-10-29T16:41:44Z) - Unsupervised Extractive Summarization by Pre-training Hierarchical
Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training.
Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities.
We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - A Simple but Tough-to-Beat Data Augmentation Approach for Natural
Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff.
cutoff relies on sampling consistency and thus adds little computational overhead.
cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z) - End-to-end Learning of Compressible Features [35.40108701875527]
Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators.
CNNs are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks.
Unfortunately, the generated features are high dimensional and expensive to store.
We propose a learned method that jointly optimize for compressibility along with the task objective.
arXiv Detail & Related papers (2020-07-23T05:17:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.