Related papers: Can We Do Better Than Random Start? The Power of Data Outsourcing

Can We Do Better Than Random Start? The Power of Data Outsourcing

URL: http://arxiv.org/abs/2205.08098v1
Date: Tue, 17 May 2022 05:34:36 GMT
Title: Can We Do Better Than Random Start? The Power of Data Outsourcing
Authors: Yi Chen, Jing Dong, Xin T. Tong
Abstract summary: Many organizations have access to abundant data but lack the computational power to process the data. We propose simulation-based algorithms that can utilize a small amount of outsourced data to find good initial points.
Score: 9.677679780556103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many organizations have access to abundant data but lack the computational power to process the data. While they can outsource the computational task to other facilities, there are various constraints on the amount of data that can be shared. It is natural to ask what can data outsourcing accomplish under such constraints. We address this question from a machine learning perspective. When training a model with optimization algorithms, the quality of the results often relies heavily on the points where the algorithms are initialized. Random start is one of the most popular methods to tackle this issue, but it can be computationally expensive and not feasible for organizations lacking computing resources. Based on three different scenarios, we propose simulation-based algorithms that can utilize a small amount of outsourced data to find good initial points accordingly. Under suitable regularity conditions, we provide theoretical guarantees showing the algorithms can find good initial points with high probability. We also conduct numerical experiments to demonstrate that our algorithms perform significantly better than the random start approach.

Related papers

Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
The Limits of Assumption-free Tests for Algorithm Performance [6.7171902258864655]
How well does an algorithm perform at a given modeling task, and which algorithm performs best? We make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$?
arXiv Detail & Related papers (2024-02-12T03:19:30Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Leveraging Data Mining Algorithms to Recommend Source Code Changes [7.959841510571622]
This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects.
arXiv Detail & Related papers (2023-04-29T18:38:23Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Efficient and Accurate Learning of Mixtures of Plackett-Luce Models [5.216020588360421]
Mixture models of Plackett-Luce (PL) are an active research area of both theoretical and practical significance. We propose an algorithm that can provide a provably accurate initial estimate and an EM algorithm that maximizes the true log-likelihood function efficiently.
arXiv Detail & Related papers (2023-02-10T16:00:40Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters [10.279748604797911]
Key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to cope with these challenges. In this thesis, we deal with each of these sources of difficulty in a different way. To efficiently address the big data issue, we develop new methods which in each iteration examine a small random subset of the training data only. To handle the big model issue, we develop methods which in each iteration update
arXiv Detail & Related papers (2020-08-26T21:15:18Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
How to Solve Fair $k$-Center in Massive Data Models [5.3283669037198615]
We design new streaming and distributed algorithms for the fair $k$-center problem. Our main contributions are: (a) the first distributed algorithm; and (b) a two-pass streaming algorithm with a provable approximation guarantee.
arXiv Detail & Related papers (2020-02-18T16:11:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.