Data-driven Algorithm Design
- URL: http://arxiv.org/abs/2011.07177v1
- Date: Sat, 14 Nov 2020 00:51:57 GMT
- Title: Data-driven Algorithm Design
- Authors: Maria-Florina Balcan
- Abstract summary: Data driven algorithm design is an important aspect of modern data science and algorithm design.
A small tweak to the parameters can cause a cascade of changes in the algorithm's behavior.
We provide strong computational and statistical performance guarantees for batch and online scenarios.
- Score: 21.39493074700162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data driven algorithm design is an important aspect of modern data science
and algorithm design. Rather than using off the shelf algorithms that only have
worst case performance guarantees, practitioners often optimize over large
families of parametrized algorithms and tune the parameters of these algorithms
using a training set of problem instances from their domain to determine a
configuration with high expected performance over future instances. However,
most of this work comes with no performance guarantees. The challenge is that
for many combinatorial problems of significant importance including
partitioning, subset selection, and alignment problems, a small tweak to the
parameters can cause a cascade of changes in the algorithm's behavior, so the
algorithm's performance is a discontinuous function of its parameters.
In this chapter, we survey recent work that helps put data-driven
combinatorial algorithm design on firm foundations. We provide strong
computational and statistical performance guarantees, both for the batch and
online scenarios where a collection of typical problem instances from the given
application are presented either all at once or in an online fashion,
respectively.
Related papers
- Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems [9.617143859697322]
We evolve benchmarking instances for chance-constrained optimization problems that contain components characterized by their expected values and variances.
Our method successfully generates diverse instances based on different features while effectively distinguishing the performance between a pair of algorithms.
arXiv Detail & Related papers (2025-01-24T06:55:54Z) - Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes [8.337399973715396]
We evaluate the performance of four randomized optimization algorithms: Randomized Hill Climbing (RHC), Simulated Annealing (SA), Genetic Algorithms (GA), and MIMIC.
We compare these algorithms using a set of benchmark fitness functions that highlight the specific challenges and requirements of each problem category.
Our study analyzes each algorithm's effectiveness based on key performance metrics, including solution quality, convergence speed, computational cost, and robustness.
arXiv Detail & Related papers (2025-01-21T23:13:01Z) - Equitable and Fair Performance Evaluation of Whale Optimization
Algorithm [4.0814527055582746]
It is essential that all algorithms are exhaustively, somewhat, and intelligently evaluated.
evaluating the effectiveness of optimization algorithms equitably and fairly is not an easy process for various reasons.
arXiv Detail & Related papers (2023-09-04T06:32:02Z) - Minimalistic Predictions to Schedule Jobs with Online Precedence
Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints.
An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z) - Accelerating ERM for data-driven algorithm design using output-sensitive techniques [26.32088674030797]
We study techniques to develop efficient learning algorithms for data-driven algorithm design.
Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes.
We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
arXiv Detail & Related papers (2022-04-07T17:27:18Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Leveraging Benchmarking Data for Informed One-Shot Dynamic Algorithm
Selection [0.9281671380673306]
A key challenge in the application of evolutionary algorithms is the selection of an algorithm instance that best suits the problem at hand.
We analyze in this work how such prior performance data can be used to infer informed dynamic algorithm selection schemes for the solution of pseudo-Boolean optimization problems.
arXiv Detail & Related papers (2021-02-12T12:27:02Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.