Option-ID Based Elimination For Multiple Choice Questions
- URL: http://arxiv.org/abs/2501.15175v2
- Date: Sat, 15 Feb 2025 17:04:57 GMT
- Title: Option-ID Based Elimination For Multiple Choice Questions
- Authors: Zhenhao Zhu, Bulou Liu, Qingyao Ai, Yiqun Liu,
- Abstract summary: Multiple choice questions (MCQs) are a popular and important task for evaluating large language models (LLMs)
Based on common strategies people use when answering MCQs, the process of elimination (PoE) has been proposed as an effective problem-solving method.
This paper proposes a PoE based on option ID. Specifically, our method eliminates option by selecting the option ID with the lowest probability.
- Score: 12.30777266124562
- License:
- Abstract: Multiple choice questions (MCQs) are a popular and important task for evaluating large language models (LLMs). Based on common strategies people use when answering MCQs, the process of elimination (PoE) has been proposed as an effective problem-solving method. Existing methods to the PoE generally fall into two categories: one involves having the LLM directly select the incorrect options, while the other involves scoring the options. However, both methods incur high computational costs and often perform worse than methods that directly answer the MCQs with the option IDs. To address this issue, this paper proposes a PoE based on option ID. Specifically, our method eliminates option by selecting the option ID with the lowest probability. We conduct experiments with 10 different LLMs in zero-shot settings on 7 publicly available datasets. The experimental results demonstrate that our method significantly improves the LLM's performance. Further analysis reveals that the sequential elimination strategy can effectively enhance the LLM's reasoning ability. Additionally, we find that sequential elimination is also applicable to few-shot settings and can be combined with debias methods to further improve LLM's performance.
Related papers
- LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering [1.0874597293913013]
Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education.
We propose a simple yet effective approach that uses Large Language Models for data generation and scoring.
Our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples.
arXiv Detail & Related papers (2024-12-13T02:48:36Z) - MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models [0.0]
This paper introduces Multiple Choice Reasoning via. Process of Elimination using Multi-Modal models.
The novel methodology is engineered to augment the efficacy of Vision-Language Models (VLMs) in multiple-choice visual reasoning tasks.
Our empirical evaluations, conducted across three benchmark datasets, reveal that MM-PoE significantly improves both zero-shot and few-shot performance.
arXiv Detail & Related papers (2024-12-10T03:13:41Z) - On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena [23.264049073539663]
Multiple-choice questions (MCQ) are frequently used to assess large language models (LLMs)
LLMs may inherently favor certain answer choice IDs, such as A/B/C/D, due to inherent biases of priori unbalanced probabilities.
This work aims to tackle these significant difficulties, and establish a new LLM evaluation benchmark through entirely open-style questions.
arXiv Detail & Related papers (2024-06-11T17:59:47Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection [80.63946798650653]
Decision centers on whether to use a large LLM with better performance or a smaller one with reduced costs.
We propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion.
Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.
arXiv Detail & Related papers (2024-05-03T14:38:59Z) - Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems [76.69936664916061]
We study how the number of LM calls affects the performance of Vote and Filter-Vote.
We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls.
arXiv Detail & Related papers (2024-03-04T19:12:48Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - POE: Process of Elimination for Multiple Choice Reasoning [19.65826015840337]
We argue a similar two-step strategy can make LMs better at multiple choice reasoning tasks.
In the first step, POE scores each option, and eliminates seemingly wrong options.
In the second step, POE masks these wrong options, and makes the final prediction from the remaining options.
arXiv Detail & Related papers (2023-10-24T07:38:43Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - Large Language Models Sensitivity to The Order of Options in
Multiple-Choice Questions [5.187383020960245]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks.
Previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order.
This paper investigates sensitivity of LLMs towards the order of options in multiple-choice questions.
arXiv Detail & Related papers (2023-08-22T14:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.