Near-Optimal Data Source Selection for Bayesian Learning
- URL: http://arxiv.org/abs/2011.10712v2
- Date: Sun, 2 May 2021 15:46:30 GMT
- Title: Near-Optimal Data Source Selection for Bayesian Learning
- Authors: Lintao Ye, Aritra Mitra and Shreyas Sundaram
- Abstract summary: We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources.
We show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees.
- Score: 1.625213292350038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a fundamental problem in Bayesian learning, where the goal is to
select a set of data sources with minimum cost while achieving a certain
learning performance based on the data streams provided by the selected data
sources. First, we show that the data source selection problem for Bayesian
learning is NP-hard. We then show that the data source selection problem can be
transformed into an instance of the submodular set covering problem studied in
the literature, and provide a standard greedy algorithm to solve the data
source selection problem with provable performance guarantees. Next, we propose
a fast greedy algorithm that improves the running times of the standard greedy
algorithm, while achieving performance guarantees that are comparable to those
of the standard greedy algorithm. The fast greedy algorithm can also be applied
to solve the general submodular set covering problem with performance
guarantees. Finally, we validate the theoretical results using numerical
examples, and show that the greedy algorithms work well in practice.
Related papers
- Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection [11.409496019407067]
In a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it.
We propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained.
arXiv Detail & Related papers (2024-10-10T08:09:58Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - An Efficient Off-Policy Reinforcement Learning Algorithm for the
Continuous-Time LQR Problem [2.512827436728378]
An off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system.
We show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration.
arXiv Detail & Related papers (2023-03-31T06:30:23Z) - Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem.
We show that our selection methods outperform random selection by $5-20%$.
In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z) - Sample Selection for Fair and Robust Training [28.94276265328868]
We propose a sample selection-based algorithm for fair and robust training.
We show that our algorithm obtains fairness and robustness better than or comparable to the state-of-the-art technique.
arXiv Detail & Related papers (2021-10-27T07:17:29Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.