Related papers: Towards a General Framework for ML-based Self-tuning Databases

Towards a General Framework for ML-based Self-tuning Databases

URL: http://arxiv.org/abs/2011.07921v2
Date: Tue, 27 Apr 2021 15:57:04 GMT
Title: Towards a General Framework for ML-based Self-tuning Databases
Authors: Thomas Schmied, Diego Didona, Andreas D\"oring, Thomas Parnell, and Nikolas Ioannou
Abstract summary: State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL) We describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. We show that while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline.
Score: 3.3437858804655383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such as unknown valid ranges of configuration parameters and combinations of parameter values that result in invalid runs, and how we mitigated them. While these issues are typically overlooked, we argue that they are a crucial barrier to the adoption of ML self-tuning techniques in databases, and thus deserve more attention from the research community. Secondly, we present experimental results obtained when tuning FoundationDB using ML methods. Unlike prior work in this domain, we also compare with the simplest of baselines: random search. Our results show that, while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline, finding a configuration that is only 4% worse than the, vastly more complex, ML methods. We conclude that future work in this area may want to focus more on randomized, model-free optimization algorithms.

Related papers

Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Path Database Guidance for Motion Planning [1.4078050092809555]
We present a new method, Path Database Guidance (PDG), which innovates on existing work in two ways. First, we use the database to compute a experimentally for determining which nodes of a search tree to expand. Second, in contrast to other methods that treat the database as a single fixed prior, our database updates as we search the implicitly defined robot configuration space.
arXiv Detail & Related papers (2025-04-07T23:00:31Z)
Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z)
Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation [28.753219581544617]
This study harnesses large language models (LLMs) as experienced DBAs for knob-tuning tasks with carefully designed prompts. We conduct experiments to compare LLM-driven approaches against traditional methods across the subtasks. Our findings reveal that LLMs not only match or surpass traditional methods but also exhibit notable interpretability.
arXiv Detail & Related papers (2024-08-05T03:26:01Z)
Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities. We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z)
Training Task Experts through Retrieval Based Distillation [55.46054242512261]
We present Retrieval Based Distillation (ReBase), a method that first retrieves data from rich online sources and then transforms them into domain-specific data. Our method significantly improves performance by up to 7.8% on SQuAD, 1.37% on MNLI, and 1.94% on BigBench-Hard.
arXiv Detail & Related papers (2024-07-07T18:27:59Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Optimal Data Generation in Multi-Dimensional Parameter Spaces, using Bayesian Optimization [0.0]
We propose a novel approach for constructing a minimal yet highly informative database for training machine learning models. We mimic the underlying relation between the output and input parameters using Gaussian process regression (GPR) Given the predicted standard deviation by GPR, we select data points using Bayesian optimization to obtain an efficient database for training ML models.
arXiv Detail & Related papers (2023-12-04T16:36:29Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD) FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs) Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)
Bayesian Meta-Prior Learning Using Empirical Bayes [3.666114237131823]
We propose a hierarchical Empirical Bayes approach that addresses the absence of informative priors, and the inability to control parameter learning rates. Our method learns empirical meta-priors from the data itself and uses them to decouple the learning rates of first-order and second-order features. Our findings are promising, as optimizing over sparse data is often a challenge.
arXiv Detail & Related papers (2020-02-04T05:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.