Related papers: Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning

Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning

URL: http://arxiv.org/abs/2510.09770v1
Date: Fri, 10 Oct 2025 18:28:36 GMT
Title: Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning
Authors: Adam Byerly, Daniel Khashabi,
Abstract summary: We introduce Gold Panning Bandits, a framework that leverages position bias as a diagnostic signal.<n>We identify relevant documents using up to 65% fewer language model queries than random permutation baselines.<n>This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.
Score: 27.797864796744665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models exhibit a strong position bias in multi-document contexts, systematically prioritizing information based on location rather than relevance. While existing approaches treat this bias as noise to be mitigated, we introduce Gold Panning Bandits, a framework that leverages position bias as a diagnostic signal: by reordering documents and observing shifts in the model's responses, we can efficiently identify the most relevant content. We frame the problem of choosing reorderings as a bipartite matching problem. While an optimal assignment can be computed at each iteration with the Hungarian algorithm in $O(N^3)$ time, we propose a greedy $O(N \log N)$ strategy that achieves comparable performance by prioritizing the placement of the most uncertain documents in the most informative positions. Our approach identifies relevant documents using up to 65\% fewer language model queries than random permutation baselines on knowledge-intensive NLP tasks, substantially reducing computational cost without model retraining. This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.

Related papers

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection.<n>LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors.<n>Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z)
RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting [16.633948320306832]
biases prevalent in manually constructed datasets can introduce spurious correlations between tokens and labels.<n>Existing debiasing methods often rely on prior knowledge of specific dataset biases.<n>We propose RAZOR, a novel, unsupervised, and data-focused debiasing approach based on text rewriting for shortcut mitigation.
arXiv Detail & Related papers (2024-12-10T17:02:58Z)
GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration [7.315456136190114]
We propose a stable matching policy called GS-matching, inspired by the Gale-Shapley algorithm.<n>Our method can perform efficiently and find more non-repetitive inliers under low overlapping conditions.
arXiv Detail & Related papers (2024-12-06T08:47:14Z)
Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context [31.091013417498825]
We propose a simple yet effective method called context repetition (CoRe)<n>This ensures that certain contiguous reasoning segments within supporting documents are presented in the optimal order.<n>Applying CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task.
arXiv Detail & Related papers (2024-10-09T17:41:53Z)
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions [7.611860976107124]
We consider coverless steganography where a Large Language Model drives an arithmetic coding decoder to generate stego-texts. An efficient method should embed secret message bits in as few language tokens as possible, while still keeping the stego-text natural and fluent.
arXiv Detail & Related papers (2024-10-06T01:30:45Z)
Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach [17.79010397902909]
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from $n$ auxiliary tasks.<n>This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning.<n>We introduce a new algorithm for estimating model fine-tuning performance without requiring repeated training.
arXiv Detail & Related papers (2024-09-28T21:26:50Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
We propose an $textitadaptive$ approach to explore this space.<n>We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate.<n>We show that it can identify the top-performing method using only 5-15% of the typical resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.