Related papers: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

URL: http://arxiv.org/abs/2406.01026v2
Date: Thu, 6 Jun 2024 06:32:45 GMT
Title: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Authors: Mengge Xue, Zhenyu Hu, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Meng Zhao, Chengguo Yin,
Abstract summary: Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs) We introduce an efficient Supervised Fine-Tuning algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF)
Score: 11.470005425117371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM's inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model's MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combining the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model's selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.

Related papers

Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models [16.34646723046073]
Video Language Models (VLMs) are designed to answer complex video-focused questions. Current benchmarks fail to capture the full reasoning capabilities of VLMs due to selection bias. This study is the first focused investigation of selection bias in video-to-text LLM-powered models.
arXiv Detail & Related papers (2024-10-18T07:52:22Z)
Mitigating Selection Bias with Node Pruning and Auxiliary Options [11.835002896308545]
Large language models (LLMs) often show unwarranted preference for certain choice options when responding to multiple-choice questions. Previous solutions utilized debiasing methods to adjust the model's input and/or output. Our work, in contrast, investigates the model's internal representation of the selection bias.
arXiv Detail & Related papers (2024-09-27T15:53:54Z)
Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model [86.9619638550683]
Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of decision shortcuts''
arXiv Detail & Related papers (2024-03-01T09:01:53Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions [5.187383020960245]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks. Previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order. This paper investigates sensitivity of LLMs towards the order of options in multiple-choice questions.
arXiv Detail & Related papers (2023-08-22T14:54:59Z)
Leveraging Large Language Models for Multiple Choice Question Answering [6.198523595657983]
We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach.
arXiv Detail & Related papers (2022-10-22T05:04:54Z)
True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable. Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z)
Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space. Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.