Data-driven Algorithm Design
- URL: http://arxiv.org/abs/2011.07177v1
- Date: Sat, 14 Nov 2020 00:51:57 GMT
- Title: Data-driven Algorithm Design
- Authors: Maria-Florina Balcan
- Abstract summary: Data driven algorithm design is an important aspect of modern data science and algorithm design.
A small tweak to the parameters can cause a cascade of changes in the algorithm's behavior.
We provide strong computational and statistical performance guarantees for batch and online scenarios.
- Score: 21.39493074700162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data driven algorithm design is an important aspect of modern data science
and algorithm design. Rather than using off the shelf algorithms that only have
worst case performance guarantees, practitioners often optimize over large
families of parametrized algorithms and tune the parameters of these algorithms
using a training set of problem instances from their domain to determine a
configuration with high expected performance over future instances. However,
most of this work comes with no performance guarantees. The challenge is that
for many combinatorial problems of significant importance including
partitioning, subset selection, and alignment problems, a small tweak to the
parameters can cause a cascade of changes in the algorithm's behavior, so the
algorithm's performance is a discontinuous function of its parameters.
In this chapter, we survey recent work that helps put data-driven
combinatorial algorithm design on firm foundations. We provide strong
computational and statistical performance guarantees, both for the batch and
online scenarios where a collection of typical problem instances from the given
application are presented either all at once or in an online fashion,
respectively.
Related papers
- Equitable and Fair Performance Evaluation of Whale Optimization
Algorithm [4.0814527055582746]
It is essential that all algorithms are exhaustively, somewhat, and intelligently evaluated.
evaluating the effectiveness of optimization algorithms equitably and fairly is not an easy process for various reasons.
arXiv Detail & Related papers (2023-09-04T06:32:02Z) - Minimalistic Predictions to Schedule Jobs with Online Precedence
Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints.
An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z) - Accelerating ERM for data-driven algorithm design using output-sensitive techniques [26.32088674030797]
We study techniques to develop efficient learning algorithms for data-driven algorithm design.
Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes.
We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
arXiv Detail & Related papers (2022-04-07T17:27:18Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Leveraging Benchmarking Data for Informed One-Shot Dynamic Algorithm
Selection [0.9281671380673306]
A key challenge in the application of evolutionary algorithms is the selection of an algorithm instance that best suits the problem at hand.
We analyze in this work how such prior performance data can be used to infer informed dynamic algorithm selection schemes for the solution of pseudo-Boolean optimization problems.
arXiv Detail & Related papers (2021-02-12T12:27:02Z) - Particle Swarm Optimization: Fundamental Study and its Application to
Optimization and to Jetty Scheduling Problems [0.0]
The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature.
While particle swarms share such advantages, they outperform evolutionary algorithms in that they require lower computational cost and easier implementation.
This paper does not intend to study their tuning, general-purpose settings are taken from previous studies, and virtually the same algorithm is used to optimize a variety of notably different problems.
arXiv Detail & Related papers (2021-01-25T02:06:30Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.