POE: Process of Elimination for Multiple Choice Reasoning
- URL: http://arxiv.org/abs/2310.15575v1
- Date: Tue, 24 Oct 2023 07:38:43 GMT
- Title: POE: Process of Elimination for Multiple Choice Reasoning
- Authors: Chenkai Ma, Xinya Du
- Abstract summary: We argue a similar two-step strategy can make LMs better at multiple choice reasoning tasks.
In the first step, POE scores each option, and eliminates seemingly wrong options.
In the second step, POE masks these wrong options, and makes the final prediction from the remaining options.
- Score: 19.65826015840337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) are capable of conducting in-context learning for
multiple choice reasoning tasks, but the options in these tasks are treated
equally. As humans often first eliminate wrong options before picking the final
correct answer, we argue a similar two-step strategy can make LMs better at
these tasks. To this end, we present the Process of Elimination (POE), a
two-step scoring method. In the first step, POE scores each option, and
eliminates seemingly wrong options. In the second step, POE masks these wrong
options, and makes the final prediction from the remaining options. Zero-shot
experiments on 8 reasoning tasks illustrate the effectiveness of POE, and a
following analysis finds our method to be especially performant on logical
reasoning tasks. We further analyze the effect of masks, and show that POE
applies to few-shot settings and large language models (LLMs) like ChatGPT.
Related papers
- Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.
We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.
Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - Option-ID Based Elimination For Multiple Choice Questions [12.30777266124562]
Multiple choice questions (MCQs) are a popular and important task for evaluating large language models (LLMs)
Based on common strategies people use when answering MCQs, the process of elimination (PoE) has been proposed as an effective problem-solving method.
This paper proposes a PoE based on option ID. Specifically, our method eliminates option by selecting the option ID with the lowest probability.
arXiv Detail & Related papers (2025-01-25T11:06:37Z) - MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models [0.0]
This paper introduces Multiple Choice Reasoning via. Process of Elimination using Multi-Modal models.
The novel methodology is engineered to augment the efficacy of Vision-Language Models (VLMs) in multiple-choice visual reasoning tasks.
Our empirical evaluations, conducted across three benchmark datasets, reveal that MM-PoE significantly improves both zero-shot and few-shot performance.
arXiv Detail & Related papers (2024-12-10T03:13:41Z) - Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models [0.0]
We formalize a planning-based approach to perform multi-step problem solving with language models.
We demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches.
arXiv Detail & Related papers (2024-04-29T18:51:17Z) - It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning [16.626335975696243]
Chain-of-thought (COT) prompting can help large language models (LLMs) reason toward correct answers, but its efficacy in reasoning toward incorrect answers is unexplored.
We propose PoE with COT, where LLMs must reason toward incorrect options on multiple-choice questions.
We find that the strategy of PoE always underperforms the strategy of choosing the correct answer.
arXiv Detail & Related papers (2023-11-13T18:18:22Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - Large Language Models Sensitivity to The Order of Options in
Multiple-Choice Questions [5.187383020960245]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks.
Previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order.
This paper investigates sensitivity of LLMs towards the order of options in multiple-choice questions.
arXiv Detail & Related papers (2023-08-22T14:54:59Z) - Increasing Probability Mass on Answer Choices Does Not Always Improve
Accuracy [60.18632773935895]
Spreading probability mass across multiple surface forms with identical meaning is thought to cause an underestimation of a model's true performance.
We propose a mathematical formalism for SFC which allows us to quantify and bound its impact for the first time.
We identify a simple method for reducing it -- namely, increasing probability mass on the given answer choices by a) including them in the prompt and b) using in-context learning with even just one example.
arXiv Detail & Related papers (2023-05-24T00:27:00Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z) - True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable.
Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.