The Benefits of a Concise Chain of Thought on Problem-Solving in Large
Language Models
- URL: http://arxiv.org/abs/2401.05618v1
- Date: Thu, 11 Jan 2024 01:52:25 GMT
- Title: The Benefits of a Concise Chain of Thought on Problem-Solving in Large
Language Models
- Authors: Matthew Renze and Erhan Guven
- Abstract summary: CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance.
Overall, CCoT leads to an average per-token cost reduction of 22.67%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We
compared standard CoT and CCoT prompts to see how conciseness impacts response
length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4
with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced
average response length by 48.70% for both GPT-3.5 and GPT-4 while having a
negligible impact on problem-solving performance. However, on math problems,
GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads
to an average per-token cost reduction of 22.67%. These results have practical
implications for AI systems engineers using LLMs to solve real-world problems
with CoT prompt-engineering techniques. In addition, these results provide more
general insight for AI researchers studying the emergent behavior of
step-by-step reasoning in LLMs.
Related papers
- Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis [60.23133327001978]
Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning.
We propose E-OPT, a benchmark for end-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [86.03285157412839]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.
CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors.
We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z) - Constrained C-Test Generation via Mixed-Integer Programming [55.28927994487036]
This work proposes a novel method to generate C-Tests; a form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap.
In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach.
We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.
arXiv Detail & Related papers (2024-04-12T21:35:21Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning [21.951313919964484]
Large language models exhibit high-level commonsense reasoning abilities.
CoT-like methods lead to a considerable number of originally correct answers turning wrong.
We use attribution tracing and causal tracing methods to probe the internal working mechanism of the model.
arXiv Detail & Related papers (2024-02-28T14:09:02Z) - In-Context Principle Learning from Mistakes [75.66979331850364]
Incontext learning (ICL) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples.
We revisit this paradigm, by learning more from the few given input-output examples.
arXiv Detail & Related papers (2024-02-08T04:42:29Z) - Applying Large Language Models and Chain-of-Thought for Automatic
Scoring [23.076596289069506]
This study investigates the application of large language models (LLMs) in the automatic scoring of student-written responses to science assessments.
We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools.
arXiv Detail & Related papers (2023-11-30T21:22:43Z) - Stress Testing Chain-of-Thought Prompting for Large Language Models [0.16317061277456998]
This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs)
We analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks.
arXiv Detail & Related papers (2023-09-28T17:21:33Z) - Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs [5.996787847938559]
We propose a novel hint of thought (HoT) prompting with explainability and zero-shot generalization.
Our HoT prompting has a significant advantage on the zero-shot reasoning task compared to existing zero-shot CoT.
arXiv Detail & Related papers (2023-05-19T06:30:17Z) - Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks.
We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving.
This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.