AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads
- URL: http://arxiv.org/abs/2507.23084v1
- Date: Wed, 30 Jul 2025 20:38:13 GMT
- Title: AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads
- Authors: Taiyi Wang, Eiko Yoneki,
- Abstract summary: AutoIndexer is a framework that combines workload compression, query optimization, and specialized RL models to scale index selection effectively.<n>It substantially lowers search complexity without sacrificing much index quality.<n>On average, it outperforms state-of-the-art RL-based index advisors by approximately 20% in workload cost savings.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Efficiently selecting indexes is fundamental to database performance optimization, particularly for systems handling large-scale analytical workloads. While deep reinforcement learning (DRL) has shown promise in automating index selection through its ability to learn from experience, few works address how these RL-based index advisors can adapt to scaling workloads due to exponentially growing action spaces and heavy trial and error. To address these challenges, we introduce AutoIndexer, a framework that combines workload compression, query optimization, and specialized RL models to scale index selection effectively. By operating on compressed workloads, AutoIndexer substantially lowers search complexity without sacrificing much index quality. Extensive evaluations show that it reduces end-to-end query execution time by up to 95% versus non-indexed baselines. On average, it outperforms state-of-the-art RL-based index advisors by approximately 20% in workload cost savings while cutting tuning time by over 50%. These results affirm AutoIndexer's practicality for large and diverse workloads.
Related papers
- Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z) - Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [50.419872452397684]
Search-R1 is an extension of reinforcement learning for reasoning frameworks.<n>It generates search queries during step-by-step reasoning with real-time retrieval.<n>It improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines.
arXiv Detail & Related papers (2025-03-12T16:26:39Z) - Semi-Parametric Retrieval via Binary Bag-of-Tokens Index [71.78109794895065]
SemI-parametric Disentangled Retrieval (SiDR) is a bi-encoder retrieval framework that decouples retrieval index from neural parameters.<n>SiDR supports a non-parametric tokenization index for search, achieving BM25-like indexing complexity with significantly better effectiveness.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - IA2: Leveraging Instance-Aware Index Advisor with Reinforcement Learning for Diverse Workloads [0.46040036610482665]
Instance-Aware Index Advisor (IA2) is a novel deep reinforcement learning (DRL)-based approach for optimizing index selection in databases.
IA2 introduces the Twin Delayed Deep Deterministic Policy Gradient - Temporal Difference State-Wise Action Refinery (TD3-TD-SWAR) model.
arXiv Detail & Related papers (2024-04-08T13:40:26Z) - Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training [16.93830041971135]
Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes.
They require frequent retrainings of their models to incorporate the changes introduced by update queries.
We develop an algorithm- hardware co-designed string-key learned index system, dubbed SIA.
arXiv Detail & Related papers (2024-03-18T04:44:00Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously.
We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z) - ML-Powered Index Tuning: An Overview of Recent Progress and Open
Challenges [5.675806178685878]
Scale and complexity of workloads in modern cloud services have brought into sharper focus a critical challenge in automated index tuning.
This paper directs attention to these challenges within automated index tuning and explores ways in which machine learning (ML) techniques provide new opportunities in their mitigation.
arXiv Detail & Related papers (2023-08-25T19:20:28Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - CARMI: A Cache-Aware Learned Index with a Cost-based Construction
Algorithm [1.9798034349981157]
We propose a cache-aware learned index (CARMI) design to improve the efficiency of the Recursive Model Index (RMI) framework.
We formulate the problem of finding the optimal design of a learned index as an optimization problem and propose a dynamic programming algorithm for solving it.
Experiments show that our index construction strategy can construct indexes with significantly better performance compared to baselines.
arXiv Detail & Related papers (2021-03-01T09:20:53Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Tsunami: A Learned Multi-dimensional Index for Correlated Data and
Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.