Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching
- URL: http://arxiv.org/abs/2505.24643v1
- Date: Fri, 30 May 2025 14:29:55 GMT
- Title: Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching
- Authors: Juan Wisznia, Cecilia Bolaños, Juan Tollo, Giovanni Marraffini, Agustín Gianolini, Noe Hsueh, Luciano Del Corro,
- Abstract summary: We introduce a novel framework for analyzing algorithms in pairwise ranking (PRP)<n>While classical metrics based on comparison counts have traditionally been used to gauge efficiency, our analysis reveals that expensive inferences overturn these predictions.
- Score: 0.32248482136498424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel framework for analyzing sorting algorithms in pairwise ranking prompting (PRP), re-centering the cost model around LLM inferences rather than traditional pairwise comparisons. While classical metrics based on comparison counts have traditionally been used to gauge efficiency, our analysis reveals that expensive LLM inferences overturn these predictions; accordingly, our framework encourages strategies such as batching and caching to mitigate inference costs. We show that algorithms optimal in the classical setting can lose efficiency when LLM inferences dominate the cost under certain optimizations.
Related papers
- Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning [55.33984461046492]
Policy-based methods currently dominate reinforcement learning pipelines for large language model (LLM) reasoning.<n>We introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs.<n>We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy via an improved change-of-trajectory-measure analysis.
arXiv Detail & Related papers (2025-05-21T09:41:53Z) - Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models [0.36326779753373206]
Zeroth-Order (ZO) optimisation uses function evaluations instead of gradients, reducing memory usage, but suffers from slow convergence in high-dimensional models.<n>We introduce ZOPrO, a novel ZO algorithm designed for Preference optimisation in Large Language Models.<n>We demonstrate that our method consistently enhances reward signals while achieving convergence times comparable to first-order methods.
arXiv Detail & Related papers (2025-03-05T12:49:48Z) - Meta-Learning Objectives for Preference Optimization [39.15940594751445]
We show that it is possible to gain insights on the efficacy of preference optimization algorithms on simpler benchmarks.<n>We propose a novel family of PO algorithms based on mirror descent, which we call Mirror Preference Optimization (MPO)
arXiv Detail & Related papers (2024-11-10T19:11:48Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Data-Driven Performance Guarantees for Classical and Learned Optimizers [2.0403774954994858]
We introduce a data-driven approach to analyze the performance of continuous optimization algorithms.
We study classical and learneds to solve families of parametric optimization problems.
arXiv Detail & Related papers (2024-04-22T02:06:35Z) - Optimizing with Low Budgets: a Comparison on the Black-box Optimization
Benchmarking Suite and OpenAI Gym [2.511157007295545]
Black-box optimization (BO) algorithms are popular in machine learning (ML)
We compare BBO tools for ML with more classical COCOs.
Some algorithms from the BBO community perform surprisingly well on ML tasks.
arXiv Detail & Related papers (2023-09-29T18:33:10Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.