DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees
- URL: http://arxiv.org/abs/2502.05369v1
- Date: Fri, 07 Feb 2025 22:48:14 GMT
- Title: DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees
- Authors: Alireza Heidari, Amirhossein Ahmadi, Wei Zhang,
- Abstract summary: DobLIX is a dual-objective learned index specifically designed for Log-Structured Merge(LSM) tree-based key-value stores.<n>We show that DobLIX reduces indexing overhead and improves throughput by 1.19 to 2.21 times compared to state-of-the-art methods within RocksDB.
- Score: 4.077820670802213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce DobLIX, a dual-objective learned index specifically designed for Log-Structured Merge(LSM) tree-based key-value stores. Although traditional learned indexes focus exclusively on optimizing index lookups, they often overlook the impact of data access from storage, resulting in performance bottlenecks. DobLIX addresses this by incorporating a second objective, data access optimization, into the learned index training process. This dual-objective approach ensures that both index lookup efficiency and data access costs are minimized, leading to significant improvements in read performance while maintaining write efficiency in real-world LSM-tree systems. Additionally, DobLIX features a reinforcement learning agent that dynamically tunes the system parameters, allowing it to adapt to varying workloads in real-time. Experimental results using real-world datasets demonstrate that DobLIX reduces indexing overhead and improves throughput by 1.19 to 2.21 times compared to state-of-the-art methods within RocksDB, a widely used LSM-tree-based storage engine.
Related papers
- LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model [24.579793425796193]
We propose a resource-efficient index advisor that uses large language models (LLMs) without extensive fine-tuning.
LLMs frames index recommendation as a sequence-to-sequence task, taking target workload, storage constraint, and corresponding database environment as input.
Experiments on 3 OLAP and 2 real-world benchmarks reveal that LLMIdxAdvis delivers competitive index recommendation with reduced runtime.
arXiv Detail & Related papers (2025-03-10T22:01:24Z) - HuixiangDou2: A Robustly Optimized GraphRAG Approach [11.91228019623924]
Graph-based Retrieval-Augmented Generation (GraphRAG) addresses this by structuring it as a graph for dynamic retrieval.
We introduce HuixiangDou2, a robustly optimized GraphRAG framework.
Specifically, we leverage the effectiveness of dual-level retrieval and optimize its performance in a 32k context.
arXiv Detail & Related papers (2025-03-09T06:20:24Z) - SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models [11.670056503731905]
We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method.
Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation.
arXiv Detail & Related papers (2025-02-27T09:17:49Z) - More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [50.772462704559345]
We introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives.<n>Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels.<n>We develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
arXiv Detail & Related papers (2025-01-07T14:57:08Z) - Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures [0.6990493129893112]
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data.
Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization.
We propose three innovatives designed to quantify and strategically reduce data space partition overlap.
arXiv Detail & Related papers (2024-08-28T16:16:55Z) - Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER)
LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens.
LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z) - LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD [0.6774462529828165]
We introduce LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index.
Our results show that LearnedKV outperforms state-of-the-art solutions by up to 1.32x in read requests and 1.31x in write performance.
arXiv Detail & Related papers (2024-06-27T05:08:09Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval [18.15717995719973]
End-to-end Hierarchical Indexing (EHI) is a novel method for embedding-based retrieval.
EHI outperforms existing state-of-the-art methods by +1.45% in MRR@10 on MS MARCO (Dev) and +8.2% in nDCG@10 on TREC DL19.
arXiv Detail & Related papers (2023-10-13T06:53:02Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems.
Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution.
We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z) - COAX: Correlation-Aware Indexing on Multidimensional Data with Soft
Functional Dependencies [3.670422696827525]
We present COAX, a learned index for multidimensional data that learns the correlations between attributes of the dataset.
We show experimentally that by predicting correlated attributes in the data, we can improve the query execution time and reduce the memory overhead of the index.
arXiv Detail & Related papers (2020-06-29T21:22:15Z) - RadixSpline: A Single-Pass Learned Index [84.84747738666263]
We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data.
RS achieves competitive results on all datasets, despite the fact that it only has two parameters.
arXiv Detail & Related papers (2020-04-30T01:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.