Related papers: Near-Optimal Data Source Selection for Bayesian Learning

Near-Optimal Data Source Selection for Bayesian Learning

URL: http://arxiv.org/abs/2011.10712v2
Date: Sun, 2 May 2021 15:46:30 GMT
Title: Near-Optimal Data Source Selection for Bayesian Learning
Authors: Lintao Ye, Aritra Mitra and Shreyas Sundaram
Abstract summary: We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. We show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees.
Score: 1.625213292350038
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. The fast greedy algorithm can also be applied to solve the general submodular set covering problem with performance guarantees. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.

Related papers

Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection [11.409496019407067]
In a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it. We propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained.
arXiv Detail & Related papers (2024-10-10T08:09:58Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem [2.512827436728378]
An off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. We show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration.
arXiv Detail & Related papers (2023-03-31T06:30:23Z)
Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem. We show that our selection methods outperform random selection by $5-20%$. In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z)
Sample Selection for Fair and Robust Training [28.94276265328868]
We propose a sample selection-based algorithm for fair and robust training. We show that our algorithm obtains fairness and robustness better than or comparable to the state-of-the-art technique.
arXiv Detail & Related papers (2021-10-27T07:17:29Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.