Related papers: Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

URL: http://arxiv.org/abs/2505.15323v1
Date: Wed, 21 May 2025 09:58:38 GMT
Title: Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack
Authors: Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara,
Abstract summary: Large Language Models (LLMs) are increasingly evaluated on multiple-choice question answering (MCQA) tasks.<n>We propose a solution: the *prefilling attack*, a structured natural-language prefix (e.g., "*The correct option is:*") prepended to the model output.<n>Our findings suggest that prefilling is a simple, robust, and low-cost method to enhance the reliability of FTP-based evaluation in multiple-choice settings.
Score: 44.205352310633174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly evaluated on multiple-choice question answering (MCQA) tasks using *first-token probability* (FTP), which selects the answer option whose initial token has the highest likelihood. While efficient, FTP can be fragile: models may assign high probability to unrelated tokens (*misalignment*) or use a valid token merely as part of a generic preamble rather than as a clear answer choice (*misinterpretation*), undermining the reliability of symbolic evaluation. We propose a simple solution: the *prefilling attack*, a structured natural-language prefix (e.g., "*The correct option is:*") prepended to the model output. Originally explored in AI safety, we repurpose prefilling to steer the model to respond with a clean, valid option, without modifying its parameters. Empirically, the FTP with prefilling strategy substantially improves accuracy, calibration, and output consistency across a broad set of LLMs and MCQA benchmarks. It outperforms standard FTP and often matches the performance of open-ended generation approaches that require full decoding and external classifiers, while being significantly more efficient. Our findings suggest that prefilling is a simple, robust, and low-cost method to enhance the reliability of FTP-based evaluation in multiple-choice settings.

Related papers

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection [47.413985185291864]
Supervised fine-tuning is a strategy to align Large Language Models with human intent.<n>Traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer.<n>We propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting.
arXiv Detail & Related papers (2026-01-14T05:50:40Z)
Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs [16.357595595062946]
There is no consensus on how to tokenize the space following the colon, often overlooked as a trivial choice.<n>Surprisingly, we are able to recommend one specific strategy -- tokenizing the space together with the answer letter.<n>Our findings underscore the importance of careful evaluation design and highlight the need for standardized, transparent evaluation protocols.
arXiv Detail & Related papers (2025-09-18T14:47:58Z)
Cautious Next Token Prediction [62.74127603725369]
We propose a new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP)<n>In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation.<n>We show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin.
arXiv Detail & Related papers (2025-07-03T05:49:18Z)
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning [36.74546743563837]
We propose a novel framework that establishes dual quantitative metrics for preference selection.<n>Our code is publicly available at https://github.com/YunqiaoYang/PCPO.
arXiv Detail & Related papers (2025-05-29T15:20:44Z)
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction [12.92060812931049]
Minor changes in prompt can cause significant discrepancies in model performance.<n>We propose Placeholding Parallel Prediction (P3), a novel approach that predicts token probabilities across multiple positions.<n>Experiments show improved accuracy and up to 98% reduction in the standard deviation across prompts.
arXiv Detail & Related papers (2025-04-04T04:39:51Z)
Language Model Uncertainty Quantification with Attention Chain [9.093726246465117]
A large language model's (LLM) predictive uncertainty is crucial for judging the reliability of its answers.<n>We propose UQAC, an efficient method that narrows the reasoning space to a tractable size for marginalization.<n>We validate UQAC on multiple reasoning benchmarks with advanced open-source LLMs.
arXiv Detail & Related papers (2025-03-24T21:43:47Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [23.511954119467735]
Prompt Tuning (PT) enables the adaptation of Pre-trained Large Language Models (PLMs) to downstream tasks.<n>Decomposed Prompt Tuning (DePT) has demonstrated superior adaptation capabilities.<n>ADePT is composed of a short soft prompt and a shallow token-shared feed-forward neural network.
arXiv Detail & Related papers (2025-01-06T08:20:04Z)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles [23.134664392314264]
Tokenization is associated with many poorly understood shortcomings in language models (LMs)<n>This work studies how tokenization impacts model performance by analyzing and comparing models with their byte-level counterparts.<n>We introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.
arXiv Detail & Related papers (2024-10-11T23:30:42Z)
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering. We transform the raw text into a graph structure to build connections between different factual sentences. We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.