Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation
- URL: http://arxiv.org/abs/2106.15078v1
- Date: Tue, 29 Jun 2021 03:59:21 GMT
- Title: Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation
- Authors: Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Zhen Li, Bowen
Zhou, Shuguang Cui, Zhiting Hu
- Abstract summary: We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
- Score: 109.46348908829697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural text generation models are typically trained by maximizing
log-likelihood with the sequence cross entropy loss, which encourages an exact
token-by-token match between a target sequence with a generated sequence. Such
training objective is sub-optimal when the target sequence not perfect, e.g.,
when the target sequence is corrupted with noises, or when only weak sequence
supervision is available. To address this challenge, we propose a novel
Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a
target n-gram with all n-grams in the generated sequence. EISL draws
inspirations from convolutional networks (ConvNets) which are shift-invariant
to images, hence is robust to the shift of n-grams to tolerate edits in the
target sequences. Moreover, the computation of EISL is essentially a
convolution operation with target n-grams as kernels, which is easy to
implement with existing libraries. To demonstrate the effectiveness of EISL, we
conduct experiments on three tasks: machine translation with noisy target
sequences, unsupervised text style transfer, and non-autoregressive machine
translation. Experimental results show our method significantly outperforms
cross entropy loss on these three tasks.
Related papers
- Symbolic Autoencoding for Self-Supervised Sequence Learning [24.71036683224435]
$Sigma$AE is a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data.
Our results demonstrate that $Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data.
arXiv Detail & Related papers (2024-02-16T11:04:31Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Reducing Sequence Length by Predicting Edit Operations with Large
Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z) - Extrapolative Controlled Sequence Generation via Iterative Refinement [22.42501277690634]
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training.
In this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation.
Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity.
arXiv Detail & Related papers (2023-03-08T13:21:27Z) - Sequence-to-Action: Grammatical Error Correction with Action Guided
Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction.
The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence.
Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z) - MLE-guided parameter search for task loss minimization in neural
sequence modeling [83.83249536279239]
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks.
We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient.
Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
arXiv Detail & Related papers (2020-06-04T22:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.