From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
- URL: http://arxiv.org/abs/2005.14213v2
- Date: Fri, 30 Oct 2020 18:09:20 GMT
- Title: From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
- Authors: Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian
Kroth, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau
- Abstract summary: BOURBON is a log-structured merge (LSM) tree that utilizes machine learning to provide fast lookups.
We show that BOURBON improves lookup performance by 1.23x-1.78x as compared to state-of-the-art production LSMs.
- Score: 1.9003569830436575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce BOURBON, a log-structured merge (LSM) tree that utilizes machine
learning to provide fast lookups. We base the design and implementation of
BOURBON on empirically-grounded principles that we derive through careful
analysis of LSM design. BOURBON employs greedy piecewise linear regression to
learn key distributions, enabling fast lookup with minimal computation, and
applies a cost-benefit strategy to decide when learning will be worthwhile.
Through a series of experiments on both synthetic and real-world datasets, we
show that BOURBON improves lookup performance by 1.23x-1.78x as compared to
state-of-the-art production LSMs.
Related papers
- DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees [4.077820670802213]
DobLIX is a dual-objective learned index specifically designed for Log-Structured Merge(LSM) tree-based key-value stores.
We show that DobLIX reduces indexing overhead and improves throughput by 1.19 to 2.21 times compared to state-of-the-art methods within RocksDB.
arXiv Detail & Related papers (2025-02-07T22:48:14Z) - Zero-Shot Decision Tree Construction via Large Language Models [2.005837558796176]
We introduce an algorithm for constructing decision trees using large language models (LLMs) in a zero-shot manner based on Classification and Regression Trees (CART) principles.
Our approach leverages LLMs to perform operations essential for decision tree construction, including attribute discretization, probability calculation, and Gini index computation.
arXiv Detail & Related papers (2025-01-27T17:48:48Z) - Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity [61.48338027901318]
We show that fine-tuning with LLM-generated data improves target task performance and reduces out-of-domain degradation.
This is the first mechanistic explanation for the superior OOD robustness conferred by LLM-generated training data.
arXiv Detail & Related papers (2025-01-24T08:18:56Z) - LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD [0.6774462529828165]
We introduce LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index.
Our results show that LearnedKV outperforms state-of-the-art solutions by up to 1.32x in read requests and 1.31x in write performance.
arXiv Detail & Related papers (2024-06-27T05:08:09Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning [12.197327462627912]
We propose a representation enhanced analytic learning (REAL) for Exemplar-free class-incremental learning (EFCIL)
The REAL constructs a dual-stream base pretraining (DS-BPT) and a representation enhancing distillation (RED) process to enhance the representation of the extractor.
Our method addresses the issue of insufficient discriminability in representations of unseen data caused by a frozen backbone in the existing AL-based CIL.
arXiv Detail & Related papers (2024-03-20T11:48:10Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Joint Learning of Label and Environment Causal Independence for Graph
Out-of-Distribution Generalization [60.4169201192582]
We propose to incorporate label and environment causal independence (LECI) to fully make use of label and environment information.
LECI significantly outperforms prior methods on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-01T19:33:30Z) - Principled Reinforcement Learning with Human Feedback from Pairwise or
$K$-wise Comparisons [79.98542868281473]
We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF)
We show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions.
arXiv Detail & Related papers (2023-01-26T18:07:21Z) - Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams) [8.430851504111585]
We introduce an efficient, iterative algorithm which exploits shared-memory parallelism, as well as an analytic expression of the fitting energy gradient.
We show the utility of our contributions by extending to merge trees two typical PCA applications.
We present a dimensionality reduction framework exploiting the first two directions of the MT-PGA basis to generate two-dimensional layouts.
arXiv Detail & Related papers (2022-07-22T09:17:22Z) - Learning to branch with Tree MDPs [6.754135838894833]
We propose to learn branching rules from scratch via Reinforcement Learning (RL)
We propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch.
We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
arXiv Detail & Related papers (2022-05-23T07:57:32Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Principled Exploration via Optimistic Bootstrapping and Backward
Induction [84.78836146128238]
We propose a principled exploration method for Deep Reinforcement Learning (DRL) through Optimistic Bootstrapping and Backward Induction (OB2I)
OB2I constructs a general-purpose UCB-bonus through non-parametric bootstrap in DRL.
We build theoretical connections between the proposed UCB-bonus and the LSVI-UCB in a linear setting.
arXiv Detail & Related papers (2021-05-13T01:15:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.