Optimization in Machine Learning: A Distribution Space Approach
- URL: http://arxiv.org/abs/2004.08620v1
- Date: Sat, 18 Apr 2020 13:38:06 GMT
- Title: Optimization in Machine Learning: A Distribution Space Approach
- Authors: Yongqiang Cai, Qianxiao Li, Zuowei Shen
- Abstract summary: We present the viewpoint that optimization problems in machine learning can often be interpreted as minimizing a convex functional over a function space.
We repose such problems via a suitable relaxation as convex optimization problems in the space distributions.
We develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space.
- Score: 16.038814087205143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the viewpoint that optimization problems encountered in machine
learning can often be interpreted as minimizing a convex functional over a
function space, but with a non-convex constraint set introduced by model
parameterization. This observation allows us to repose such problems via a
suitable relaxation as convex optimization problems in the space of
distributions over the training parameters. We derive some simple relationships
between the distribution-space problem and the original problem, e.g. a
distribution-space solution is at least as good as a solution in the original
space. Moreover, we develop a numerical algorithm based on mixture
distributions to perform approximate optimization directly in distribution
space. Consistency of this approximation is established and the numerical
efficacy of the proposed algorithm is illustrated on simple examples. In both
theory and practice, this formulation provides an alternative approach to
large-scale optimization in machine learning.
Related papers
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Information Theoretical Importance Sampling Clustering [18.248246885248733]
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution.
We propose an information theoretical importance sampling based approach for clustering problems (ITISC)
Experiment results on synthetic datasets and a real-world load forecasting problem validate the effectiveness of the proposed model.
arXiv Detail & Related papers (2023-02-09T03:18:53Z) - Consistent Approximations in Composite Optimization [0.0]
We develop a framework for consistent approximations of optimization problems.
The framework is developed for a broad class of optimizations.
A programming analysis method illustrates extended nonlinear programming solutions.
arXiv Detail & Related papers (2022-01-13T23:57:08Z) - Distributed and Stochastic Optimization Methods with Gradient
Compression and Local Steps [0.0]
We propose theoretical frameworks for the analysis and distributed methods with error compensation and local updates.
We develop more than 20 new optimization methods, including the first linearly converging Error-pensated and first distributed Local-SGD methods.
arXiv Detail & Related papers (2021-12-20T16:12:54Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Communication-efficient distributed eigenspace estimation [31.69089186688224]
We develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix.
Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution.
We show that our algorithm achieves a similar error rate to that of a centralized estimator.
arXiv Detail & Related papers (2020-09-05T02:11:22Z) - A Multi-Agent Primal-Dual Strategy for Composite Optimization over
Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.