Related papers: Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks

Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks

URL: http://arxiv.org/abs/2412.00546v1
Date: Sat, 30 Nov 2024 17:39:59 GMT
Title: Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks
Authors: Mohsen Dehghankar, Abolfazl Asudeh,
Abstract summary: Large language models (LLMs) have quickly emerged as practical and versatile tools.<n>We consider the application of LLMs on symmetric tasks where a query is asked on an (unordered) bag of elements.
Score: 9.867695275243879
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have quickly emerged as practical and versatile tools that provide new solutions for a wide range of domains. In this paper, we consider the application of LLMs on symmetric tasks where a query is asked on an (unordered) bag of elements. Examples of such tasks include answering aggregate queries on a database table. In general, when the bag contains a large number of elements, LLMs tend to overlook some elements, leading to challenges in generating accurate responses to the query. LLMs receive their inputs as ordered sequences. However, in this problem, we leverage the fact that the symmetric input is not ordered, and reordering should not affect the LLM's response. Observing that LLMs are less likely to miss elements at certain positions of the input, we introduce the problem of LLM input reranking: to find a ranking of the input that maximizes the LLM's accuracy for the given query without making explicit assumptions about the query. Finding the optimal ranking requires identifying (i) the relevance of each input element for answering the query and (ii) the importance of each rank position for the LLM's attention. We develop algorithms for estimating these values efficiently utilizing a helper LLM. We conduct comprehensive experiments on different synthetic and real datasets to validate our proposal and to evaluate the effectiveness of our proposed algorithms. Our experiments confirm that our reranking approach improves the accuracy of the LLMs on symmetric tasks by up to $99\%$ proximity to the optimum upper bound.

Related papers

On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization [54.965787768076254]
Large Language Models have been recently exploited as judges for complex natural language processing tasks, such as Q&A.<n>We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely code generation and code summarization.
arXiv Detail & Related papers (2025-07-22T13:40:26Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales [3.4068099825211986]
Two most common prompts to elicit relevance judgments are pointwise scoring and listwise ranking.<n>The current research community consensus is that listwise ranking yields superior performance.<n>In tension with this hypothesis, we find that the gap between pointwise scoring and listwise ranking shrinks when pointwise scoring is implemented using a sufficiently large ordinal relevance label space.
arXiv Detail & Related papers (2025-05-25T21:41:35Z)
Guiding Retrieval using LLM-based Listwise Rankers [15.3583908068962]
We propose an adaptation of an existing adaptive retrieval method that supports the listwise setting. Specifically, our proposed algorithm merges results both from the initial ranking and feedback documents. We demonstrate that our method can improve nDCG@10 by up to 13.23% and recall by 28.02%--all while keeping the total number of LLM inferences constant and overheads due to the adaptive process minimal.
arXiv Detail & Related papers (2025-01-15T22:23:53Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models. We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. We introduce SelectLLM, a novel algorithm that directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z)
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs) SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe) Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z)
Optimizing LLM Queries in Relational Workloads [58.254894049950366]
We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries. We implement these optimizations in Apache Spark, with vLLM as the model serving backend. We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems [76.69936664916061]
We study how the number of LM calls affects the performance of Vote and Filter-Vote. We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls.
arXiv Detail & Related papers (2024-03-04T19:12:48Z)
Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization [7.674972936853123]
We investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. We observe that 100% reliability in generating the response in the expected format is usually limited to certain closed-source LLMs.
arXiv Detail & Related papers (2024-02-29T19:00:47Z)
PiCO: Peer Review in LLMs based on the Consistency Optimization [19.130941716491716]
We use peer-review mechanisms to measure large language models (LLMs) automatically. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. We propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings.
arXiv Detail & Related papers (2024-02-02T18:49:26Z)
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation [28.89786334298637]
We develop a novel method to optimize LLMs using ranking metrics. Rather than a traditional full ordering, we advocate for a partial ordering. We test our system's improved response generation ability using benchmark datasets.
arXiv Detail & Related papers (2023-11-15T17:27:14Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z)
Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.