Reducing Sequence Length by Predicting Edit Operations with Large
Language Models
- URL: http://arxiv.org/abs/2305.11862v2
- Date: Sat, 21 Oct 2023 00:57:02 GMT
- Title: Reducing Sequence Length by Predicting Edit Operations with Large
Language Models
- Authors: Masahiro Kaneko, Naoaki Okazaki
- Abstract summary: This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
- Score: 50.66922361766939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in
various tasks and gained significant attention. LLMs are also used for local
sequence transduction tasks, including grammatical error correction (GEC) and
formality style transfer, where most tokens in a source text are kept
unchanged. However, the models that generate all target tokens in such tasks
have a tendency to simply copy the input text as is, without making needed
changes, because the difference between input and output texts is minimal in
the training data. This is also inefficient because the computational cost
grows quadratically with the target sequence length with Transformer. This
paper proposes predicting edit spans for the source text for local sequence
transduction tasks. Representing an edit span with a position of the source
text and corrected tokens, we can reduce the length of the target sequence and
the computational cost for inference. We apply instruction tuning for LLMs on
the supervision data of edit spans. Experiments show that the proposed method
achieves comparable performance to the baseline in four tasks, paraphrasing,
formality style transfer, GEC, and text simplification, despite reducing the
length of the target text by as small as 21%. Furthermore, we report that the
task-specific fine-tuning with the proposed method achieved state-of-the-art
performance in the four tasks.
Related papers
- TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers [2.7651063843287718]
TexIm FAST is a novel methodology for generating fixed-length representations through a self-supervised Variational Auto-Encoder (VAE) for semantic evaluation applying transformers (TexIm FAST)
The pictorial representations allow oblivious inference while retaining the linguistic intricacies, and are potent in cross-modal applications.
The efficacy of TexIm FAST has been extensively analyzed for the task of Semantic Textual Similarity (STS) upon the MSRPC, CNN/ Daily Mail, and XSum data-sets.
arXiv Detail & Related papers (2024-06-06T18:28:50Z) - Successor Features for Efficient Multisubject Controlled Text Generation [48.37713738712319]
We introduce SF-GEN, which is grounded in two primary concepts: successor features (SFs) and language model rectification.
SF-GEN seamlessly integrates the two to enable dynamic steering of text generation with no need to alter the LLM's parameters.
To the best of our knowledge, our research represents the first application of successor features in text generation.
arXiv Detail & Related papers (2023-11-03T00:17:08Z) - Structural Self-Supervised Objectives for Transformers [3.018656336329545]
This thesis focuses on improving the pre-training of natural language models using unsupervised raw data.
In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM)
In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications.
arXiv Detail & Related papers (2023-09-15T09:30:45Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - Zero-shot Learning by Generating Task-specific Adapters [38.452434222367515]
We introduce Hypter, a framework that improves zero-shot transferability by training a hypernetwork to generate task-specific adapters from task descriptions.
This formulation enables learning at task level, and greatly reduces the number of parameters by using light-weight adapters.
arXiv Detail & Related papers (2021-01-02T10:50:23Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Seq2Edits: Sequence Transduction Using Span-level Edit Operations [10.785577504399077]
Seq2Edits is an open-vocabulary approach to sequence editing for natural language processing (NLP) tasks.
We evaluate our method on five NLP tasks (text normalization, sentence fusion, sentence splitting & rephrasing, text simplification, and grammatical error correction)
For grammatical error correction, our method speeds up inference by up to 5.2x compared to full sequence models.
arXiv Detail & Related papers (2020-09-23T13:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.