Related papers: Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

URL: http://arxiv.org/abs/2409.18943v2
Date: Tue, 1 Oct 2024 09:20:58 GMT
Title: Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Authors: Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, yuelin bai, Run Luo, Longze Chen, Min Yang,
Abstract summary: Large language models often struggle to generate responses of a specific length. We introduce a novel, model-agnostic approach called Ruler to enhance the instruction-following ability of large language models under length-constrained instructions.
Score: 14.175953642749649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a specific length, large language models often struggle to meet users' needs due to their inherent difficulty in accurately perceiving numerical constraints. To explore the ability of large language models to control the length of generated responses, we propose the Target Length Generation Task (TLG) and design two metrics, Precise Match (PM) and Flexible Match (FM) to evaluate the model's performance in adhering to specified response lengths. Furthermore, we introduce a novel, model-agnostic approach called Ruler, which employs Meta Length Tokens (MLTs) to enhance the instruction-following ability of large language models under length-constrained instructions. Specifically, Ruler equips LLMs with the ability to generate responses of a specified length based on length constraints within the instructions. Moreover, Ruler can automatically generate appropriate MLT when length constraints are not explicitly provided, demonstrating excellent versatility and generalization. Comprehensive experiments show the effectiveness of Ruler across different LLMs on Target Length Generation Task, e.g., at All Level 27.97 average gain on PM, 29.57 average gain on FM. In addition, we conduct extensive ablation experiments to further substantiate the efficacy and generalization of Ruler. Our code and data is available at https://github.com/Geaming2002/Ruler.

Related papers

Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines [71.14354526117958]
In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs)<n>We present LongGuide, which efficiently generates two parallel streams of guidelines capturing task language and format properties.<n>LongGuide automatically selects the best combination of guidelines, improving both strong open- and closed-source LLMs by over 5% in both zero- and few-shot settings.
arXiv Detail & Related papers (2025-06-02T02:35:24Z)
CASTILLO: Characterizing Response Length Distributions of Large Language Models [3.5041586868397854]
We introduce CASTILLO, a dataset characterizing response length distributions across 13 widely-used large language models.<n>Our analysis reveals significant inter- and intra-model variability in response lengths, as well as model-specific behaviors and occurrences of partial text degeneration in only subsets of responses.
arXiv Detail & Related papers (2025-05-22T16:35:33Z)
LIFEBench: Evaluating Length Instruction Following in Large Language Models [41.637005190608946]
We introduce LIFEBench to evaluate large language models' ability to follow length instructions.<n>LIFEBench consists of 10,800 instances across 4 task categories in both English and Chinese.<n>We find that most models reasonably follow short-length instructions but deteriorate sharply beyond a certain threshold.
arXiv Detail & Related papers (2025-05-22T05:08:27Z)
Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models. DisCIPL uses a Planner model to generate a task-specific inference program. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Controlled Diversity: Length-optimized Natural Language Generation [1.3888744377495608]
LLMs are not generally able to adjust the length of their outputs based on strict length requirements. We present an approach to train LLMs to acquire this capability by augmenting existing data and applying existing fine-tuning techniques. Our results indicate that our method may change the response quality when using training data that was not generated by the baseline model.
arXiv Detail & Related papers (2025-02-26T17:38:58Z)
Length Controlled Generation for Black-box LLMs [70.57649832433451]
Large language models (LLMs) have demonstrated impressive instruction following capabilities, but struggle to accurately manage the length of generated text. We propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. Our framework achieves almost 100% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization.
arXiv Detail & Related papers (2024-12-19T09:07:38Z)
Precise Length Control in Large Language Models [1.3654846342364308]
Large Language Models (LLMs) are increasingly used in production systems. We propose a method to adapt pre-trained decoder-only LLMs for precise control of response length.
arXiv Detail & Related papers (2024-12-16T16:22:27Z)
Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen. It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models [27.26285945442178]
InstructCMP is an approach to the sentence compression task that can consider the length constraint through instructions. We show that applying the length priming significantly improves performances of InstructCMP in both zero-shot and fine-tuning settings.
arXiv Detail & Related papers (2024-06-16T23:00:47Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Prompt-Based Length Controlled Generation with Reinforcement Learning [48.49553921757085]
We propose a prompt-based length control method to achieve high-accuracy length controlled generation. We adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models. Our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT.
arXiv Detail & Related papers (2023-08-23T09:43:10Z)
Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification [6.091096843566857]
We present a learning-based approach for translating from natural language commands to specifications with very limited human-labeled training data. This is in stark contrast to existing natural-language to translators, which require large human-labeled datasets. We show that we can translate natural language commands at 75% accuracy with far less human data.
arXiv Detail & Related papers (2023-03-09T00:09:58Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. We refer to them as Augmented Language Models (ALMs) The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.