Related papers: Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

URL: http://arxiv.org/abs/2106.15078v1
Date: Tue, 29 Jun 2021 03:59:21 GMT
Title: Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation
Authors: Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Zhen Li, Bowen Zhou, Shuguang Cui, Zhiting Hu
Abstract summary: We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
Score: 109.46348908829697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address this challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL draws inspirations from convolutional networks (ConvNets) which are shift-invariant to images, hence is robust to the shift of n-grams to tolerate edits in the target sequences. Moreover, the computation of EISL is essentially a convolution operation with target n-grams as kernels, which is easy to implement with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation. Experimental results show our method significantly outperforms cross entropy loss on these three tasks.

Related papers

Symbolic Autoencoding for Self-Supervised Sequence Learning [24.71036683224435]
$Sigma$AE is a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. Our results demonstrate that $Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data.
arXiv Detail & Related papers (2024-02-16T11:04:31Z)
GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Reducing Sequence Length by Predicting Edit Operations with Large Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks. We apply instruction tuning for Large Language Models on the supervision data of edit spans. Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z)
Extrapolative Controlled Sequence Generation via Iterative Refinement [22.42501277690634]
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. In this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity.
arXiv Detail & Related papers (2023-03-08T13:21:27Z)
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction. The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence. Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z)
MLE-guided parameter search for task loss minimization in neural sequence modeling [83.83249536279239]
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
arXiv Detail & Related papers (2020-06-04T22:21:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.