Related papers: EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models

Related papers

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing [70.35888047551643]
We present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs. LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts. We find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks.
arXiv Detail & Related papers (2025-02-14T08:04:22Z)
Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
LLM Program Optimization via Retrieval Augmented Search [71.40092732256252]
We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations. We show that RAS performs 1.8$times$ better than prior state-of-the-art blackbox adaptation strategies. We also propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits"
arXiv Detail & Related papers (2025-01-31T06:34:47Z)
Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models [40.21540137079309]
Long-context Language Models (LLMs) enable the full ranking of all passages within a single inference. We show that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting. We propose a new complete listwise label construction approach and a novel importance-aware learning objective for full ranking.
arXiv Detail & Related papers (2024-12-19T06:44:59Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models. We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
RRADistill: Distilling LLMs' Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine [2.0379810233726126]
Large Language Models (LLMs) excel at understanding the semantic relationships between queries and documents. These queries are challenging for feedback-based rankings due to sparse user engagement and limited feedback. We propose an efficient label generation pipeline and novel sLLM training methods for both encoder and decoder models.
arXiv Detail & Related papers (2024-10-08T11:28:06Z)
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs [21.689490112983677]
We introduce MetaLLM, a framework that dynamically routes each query to the optimal large language models (LLMs) for classification tasks. By framing the selection problem as a multi-armed bandit, MetaLLM balances prediction accuracy and cost efficiency under uncertainty. Our experiments, conducted on popular LLM platforms, showcase MetaLLM's efficacy in real-world scenarios.
arXiv Detail & Related papers (2024-07-15T15:45:07Z)
Efficient Sequential Decision Making with Large Language Models [19.083642464977224]
This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. We propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making.
arXiv Detail & Related papers (2024-06-17T22:13:22Z)
Reinforcement Learning from Human Feedback with Active Queries [67.27150911254155]
Current reinforcement learning approaches often require a large amount of human-labelled preference data. We propose query-efficient RLHF methods, inspired by the success of active learning. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z)
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search [35.44746116088232]
Federated search will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines. Current SOTA resource selection methodologies rely on feature-based learning approaches. We propose ReSLLM to drive the selection of resources in federated search in a zero-shot setting.
arXiv Detail & Related papers (2024-01-31T07:58:54Z)
Cache & Distil: Optimising API Calls to Large Language Models [82.32065572907125]
Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student. This student gradually gains proficiency in independently handling an increasing number of user requests.
arXiv Detail & Related papers (2023-10-20T15:01:55Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs) Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes [54.13559879916708]
EVAPORATE is a prototype system powered by large language models (LLMs) Code synthesis is cheap, but far less accurate than directly processing each document with the LLM. We propose an extended code implementation, EVAPORATE-CODE+, which achieves better quality than direct extraction.
arXiv Detail & Related papers (2023-04-19T06:00:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.