Related papers: Large Language Models are Contrastive Reasoners

Large Language Models are Contrastive Reasoners

URL: http://arxiv.org/abs/2403.08211v2
Date: Wed, 22 May 2024 21:06:37 GMT
Title: Large Language Models are Contrastive Reasoners
Authors: Liang Yao,
Abstract summary: We show how contrastive prompting significantly improves the ability of large language models to perform complex reasoning. Experiments on various large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods.
Score: 8.427805316635318
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompting methods play a crucial role in enhancing the capabilities of pre-trained large language models (LLMs). We explore how contrastive prompting (CP) significantly improves the ability of large language models to perform complex reasoning. We demonstrate that LLMs are decent contrastive reasoners by simply adding "Let's give a correct and a wrong answer." before LLMs provide answers. Experiments on various large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks without any hand-crafted few-shot examples, such as increasing the accuracy on GSM8K from 35.9% to 88.8% and AQUA-RAT from 41.3% to 62.2% with the state-of-the-art GPT-4 model. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods, resulting in improved or comparable results when compared to state-of-the-art methods. Our code is available at https://github.com/yao8839836/cp

Related papers

Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts [5.397565689903148]
We introduce the ZIP score (Zero-shot Importance of Perturbation score), a versatile metric applicable to both open and closed-source models. We show that while both'step-by-step' and 'think' show high ZIP scores, which one is more influential depends on the model and task.
arXiv Detail & Related papers (2025-02-05T18:04:29Z)
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models [8.370453544530914]
Large Language Models (LLMs) exhibit impressive performance across various domains but still struggle with arithmetic reasoning tasks. Recent work shows the effectiveness of prompt design methods in enhancing reasoning capabilities. We propose a novel and effective Teaching-Inspired Integrated Framework, which emulates the instructional process of a teacher guiding students.
arXiv Detail & Related papers (2024-10-10T16:02:36Z)
Better Zero-Shot Reasoning with Role-Play Prompting [10.90357246745529]
Role-play prompting consistently surpasses the standard zero-shot approach across most datasets. This highlights its potential to augment the reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-08-15T11:08:30Z)
Progressive-Hint Prompting Improves Reasoning in Large Language Models [63.98629132836499]
This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP) It enables automatic multiple interactions between users and Large Language Models (LLMs) by using previously generated answers as hints to progressively guide toward the correct answers. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient.
arXiv Detail & Related papers (2023-04-19T16:29:48Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
FCM: Forgetful Causal Masking Makes Causal Language Models Better Zero-Shot Learners [139.6321017962092]
We propose a simple technique that significantly boosts the performance of large language models without adding computational cost. Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations. Experimental results show that our method also improves PaLM's zero and few-shot performance on a diverse suite of tasks.
arXiv Detail & Related papers (2022-10-24T17:46:57Z)
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training. Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN. Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z)
Making Large Language Models Better Reasoners with Step-Aware Verifier [49.16750018427259]
DIVERSE (Diverse Verifier on Reasoning Step) is a novel approach that further enhances the reasoning capability of language models. We evaluate DIVERSE on the latest language model code-davinci and show that it achieves new state-of-the-art results on six of eight reasoning benchmarks.
arXiv Detail & Related papers (2022-06-06T03:38:36Z)
Large Language Models are Zero-Shot Reasoners [28.6899375595088]
Chain of thought (CoT) prompting is a technique for eliciting complex multi-step reasoning through step-by-step answer examples. We show that LLMs are decent zero-shot reasoners by simply adding Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances.
arXiv Detail & Related papers (2022-05-24T09:22:26Z)
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models [67.3725459417758]
PERFECT is a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting. We show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods.
arXiv Detail & Related papers (2022-04-03T22:31:25Z)
Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models. For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.