Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
- URL: http://arxiv.org/abs/2407.19825v1
- Date: Mon, 29 Jul 2024 09:21:52 GMT
- Title: Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
- Authors: Sania Nayab, Giulio Rossolini, Giorgio Buttazzo, Nicolamaria Manes, Fabrizio Giacomelli,
- Abstract summary: This paper analyzes the impact of output lengths on large language models (LLMs) inference pipelines.
It proposes novel metrics to evaluate them in terms of textitcorrect conciseness.
It also examines the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT.
- Score: 4.299153274884264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Today's large language models (LLMs) can solve challenging question-answering tasks, and prompt engineering techniques, such as chain-of-thought (CoT), have gained attention for enhancing the explanation and correctness of outputs. Nevertheless, models require significant time to generate answers augmented with lengthy reasoning details. To address this issue, this paper analyzes the impact of output lengths on LLM inference pipelines and proposes novel metrics to evaluate them in terms of \textit{correct conciseness}. It also examines the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT (CCoT), which encourages the model to limit output length. Experiments on pre-trained LLMs demonstrated the benefit of the proposed metrics and the effectiveness of CCoT across different models. For instance, constraining the reasoning of LLaMA2-70b to 100 words improves the accuracy from 36.01\% (CoT) to 41.07\% (CCoT) on the GSM8K dataset, while reducing the average output length by 28 words.
Related papers
- MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs [2.0921175288836746]
Large Language Models (LLMs) have been applied to various hardware design tasks, including Verilog code generation, EDA tool scripting, and RTL bug fixing.
In this paper, we assess the ability of LLMs to reason about post-synthesis metrics of Verilog designs.
We introduce MetRex, a large-scale dataset comprising 25,868 Verilog HDL designs and their corresponding post-synthesis metrics, namely area, delay, and static power.
arXiv Detail & Related papers (2024-11-05T19:52:58Z) - Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG)
We compute a factuality score that can be thresholded to yield a binary decision.
Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z) - LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models.
It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies.
LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z) - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models [48.35385912526338]
This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs)
We isolate the effect of input length using multiple versions of the same sample, each being extended with padding of different lengths, types and locations.
We show that the degradation trend appears in every version of our dataset, although at different intensities.
arXiv Detail & Related papers (2024-02-19T16:04:53Z) - CliqueParcel: An Approach For Batching LLM Prompts That Jointly
Optimizes Efficiency And Faithfulness [13.554160815699435]
CliqueParcel is designed to improve efficiency of large language models (LLMs) during the inference process.
CliqueParcel is tested on eight widely recognized datasets.
This work provides novel insights into inference efficiency and demonstrates promising performance.
arXiv Detail & Related papers (2024-02-17T22:37:17Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z) - Reducing Sequence Length by Predicting Edit Operations with Large
Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z) - LenAtten: An Effective Length Controlling Unit For Text Summarization [5.554982420311913]
Fixed length summarization aims at generating summaries with a preset number of words or characters.
Most recent researches incorporate length information with word embeddings as the input to the recurrent decoding unit.
We present an effective length controlling unit Length Attention (LenAtten) to break this trade-off.
arXiv Detail & Related papers (2021-06-01T08:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.