Stress Testing Chain-of-Thought Prompting for Large Language Models
- URL: http://arxiv.org/abs/2309.16621v1
- Date: Thu, 28 Sep 2023 17:21:33 GMT
- Title: Stress Testing Chain-of-Thought Prompting for Large Language Models
- Authors: Aayush Mishra, Karan Thakkar
- Abstract summary: This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs)
We analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks.
- Score: 0.16317061277456998
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This report examines the effectiveness of Chain-of-Thought (CoT) prompting in
improving the multi-step reasoning abilities of large language models (LLMs).
Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the
impact of three types of CoT prompt perturbations, namely CoT order, CoT
values, and CoT operators on the performance of GPT-3 on various tasks. Our
findings show that incorrect CoT prompting leads to poor performance on
accuracy metrics. Correct values in the CoT is crucial for predicting correct
answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT
order are wrong, do not affect the performance as drastically when compared to
the value based perturbations. This research deepens our understanding of CoT
prompting and opens some new questions regarding the capability of LLMs to
learn reasoning in context.
Related papers
- A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models.
We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z) - Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners [19.40385041079461]
Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues.
We first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms.
We then conduct a joint analysis of the causal relevance among the context, CoT, and answer during reasoning.
arXiv Detail & Related papers (2024-05-29T09:17:46Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Analyzing Chain-of-Thought Prompting in Large Language Models via
Gradient-based Feature Attributions [10.621564997491808]
Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models.
We investigate whether CoT prompting affects the relative importances they assign to particular input tokens.
Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt, it increases the robustness of saliency scores to question perturbations and variations in model output.
arXiv Detail & Related papers (2023-07-25T08:51:30Z) - When do you need Chain-of-Thought Prompting for ChatGPT? [87.45382888430643]
Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models(LLMs)
It is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
arXiv Detail & Related papers (2023-04-06T17:47:29Z) - Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks.
We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving.
This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.