Related papers: InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models

InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models

URL: http://arxiv.org/abs/2406.11097v2
Date: Tue, 18 Jun 2024 18:35:52 GMT
Title: InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models
Authors: Juseon-Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura,
Abstract summary: InstructCMP is an approach to the sentence compression task that can consider the length constraint through instructions. We show that applying the length priming significantly improves performances of InstructCMP in both zero-shot and fine-tuning settings.
Score: 27.26285945442178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extractive summarization can produce faithful summaries but often requires additional constraints such as a desired summary length. Traditional sentence compression models do not typically consider the constraints because of their restricted model abilities, which require model modifications for coping with them. To bridge this gap, we propose Instruction-based Compression (InstructCMP), an approach to the sentence compression task that can consider the length constraint through instructions by leveraging the zero-shot task-solving abilities of Large Language Models (LLMs). For this purpose, we created new evaluation datasets by transforming traditional sentence compression datasets into an instruction format. By using the datasets, we first reveal that the current LLMs still face challenges in accurately controlling the length for a compressed text. To address this issue, we propose an approach named "length priming," that incorporates additional length information into the instructions without external resources. While the length priming effectively works in a zero-shot setting, a training dataset with the instructions would further improve the ability of length control. Thus, we additionally created a training dataset in an instruction format to fine-tune the model on it. Experimental results and analysis show that applying the length priming significantly improves performances of InstructCMP in both zero-shot and fine-tuning settings without the need of any model modifications.

Related papers

Length Controlled Generation for Black-box LLMs [70.57649832433451]
Large language models (LLMs) have demonstrated impressive instruction following capabilities, but struggle to accurately manage the length of generated text. We propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. Our framework achieves almost 100% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization.
arXiv Detail & Related papers (2024-12-19T09:07:38Z)
Precise Length Control in Large Language Models [1.3654846342364308]
Large Language Models (LLMs) are increasingly used in production systems. We propose a method to adapt pre-trained decoder-only LLMs for precise control of response length.
arXiv Detail & Related papers (2024-12-16T16:22:27Z)
Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen. It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z)
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models [14.175953642749649]
Large language models often struggle to generate responses of a specific length. We introduce a novel, model-agnostic approach called Ruler to enhance the instruction-following ability of large language models under length-constrained instructions.
arXiv Detail & Related papers (2024-09-27T17:44:58Z)
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum [30.46329559544246]
We introduce dataset decomposition, a novel variable sequence length training technique. We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach. Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks.
arXiv Detail & Related papers (2024-05-21T22:26:01Z)
LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment. We construct a long instruction-following dataset using Self-Instruct. We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Prompt-Based Length Controlled Generation with Reinforcement Learning [48.49553921757085]
We propose a prompt-based length control method to achieve high-accuracy length controlled generation. We adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models. Our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT.
arXiv Detail & Related papers (2023-08-23T09:43:10Z)
Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window. We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors. We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
Reinforced Abstractive Summarization with Adaptive Length Controlling [12.793451906532223]
Controllable summarization, especially of the length, is an important issue for some practical applications. We propose an textbfAdaptive textbfLength textbfControlling textbfOptimization (textbfALCO) method to leverage two-stage abstractive summarization model.
arXiv Detail & Related papers (2021-12-14T16:48:47Z)
Length-controllable Abstractive Summarization by Guiding with Summary Prototype [27.094797760775297]
We propose a new length-controllable abstractive summarization model. Our model generates a summary in two steps. Experiments with the CNN/Daily Mail dataset and the NEWSROOM dataset show that our model outperformed previous models in length-controlled settings.
arXiv Detail & Related papers (2020-01-21T04:01:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.