Access Paths for Efficient Ordering with Large Language Models
- URL: http://arxiv.org/abs/2509.00303v1
- Date: Sat, 30 Aug 2025 01:44:36 GMT
- Title: Access Paths for Efficient Ordering with Large Language Models
- Authors: Fuheng Zhao, Jiayue Chen, Yiming Pan, Tahseen Rabbani, Divyakant Agrawal, Amr El Abbadi,
- Abstract summary: We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework.<n>We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs.
- Score: 7.826046892571884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework. Our experiments show that no single approach is universally optimal, with effectiveness depending on query characteristics and data. We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs. With extensive experiments, our agreement-based procedure is effective at determining batch size for value-based methods, the majority-voting mechanism consistently strengthens pairwise comparisons on GPT-4o, and external merge sort achieves high accuracy-efficiency trade-offs across datasets and models. We further observe a log-linear scaling between compute cost and ordering quality, offering the first step toward principled cost models for LLM powered data systems.
Related papers
- Nearly Optimal Active Preference Learning and Its Application to LLM Alignment [68.56793807995417]
Aligning large language models depends on high-quality datasets of human preference labels.<n>Many existing approaches adopt classical experimental design criteria such as G- or D-optimality.<n>In this work, we identify a simple intuition specific to preference learning that calls into question the suitability of these existing design objectives.
arXiv Detail & Related papers (2026-02-02T03:21:29Z) - Nonparametric LLM Evaluation from Preference Data [86.96268870461472]
We propose a nonparametric statistical framework, DMLEval, for comparing and ranking large language models (LLMs) from preference data.<n>Our framework provides practitioners with powerful, state-of-the-art methods for comparing or ranking LLMs.
arXiv Detail & Related papers (2026-01-29T15:00:07Z) - A Systematic Study of Model Merging Techniques in Large Language Models [43.5967188676583]
Model merging combines multiple fine-tuned checkpoints into a single model without additional training.<n>We present a large-scale, systematic evaluation of six state-of-the-art merging methods.<n>Results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs.
arXiv Detail & Related papers (2025-11-26T14:28:11Z) - TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection [9.020110377060153]
We present TACOS, an innovative method that integrates Open Tagging and Comparative Scoring for IFT data selection.<n>To capture data diversity, we leverage LLMs to assign open-domain tags to human queries.<n>We suggest a comparative scoring method that allows the relative quality evaluation of samples within a cluster, avoiding inconsistent criteria seen in singleton-based evaluations.
arXiv Detail & Related papers (2025-07-04T15:46:07Z) - SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z) - GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems [17.1208625827132]
Large language models (LLMs) have shown great potential in recommender systems.<n>GORACS is a novel Group-level Optimal tRAnsport-guided Coreset Selection framework for LLM-based recommender systems.
arXiv Detail & Related papers (2025-06-04T14:46:18Z) - Efficient Evaluation of Large Language Models via Collaborative Filtering [25.734508624520164]
Large Language Models (LLMs) have been proposed to measure and compare the capabilities of different LLMs.<n> evaluating LLMs is costly due to the large number of test instances and their slow inference speed.<n>We propose a two-stage method to efficiently estimate a model's real performance on a given benchmark.
arXiv Detail & Related papers (2025-04-05T07:46:30Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have been widely adopted due to their remarkable performance across various applications.<n>These individual LLMs show limitations in generalization and performance on complex tasks due to inherent training biases, model size constraints, and the quality or diversity of pre-training datasets.<n>We introduce SelectLLM, which efficiently directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.