Related papers: Optimization in Machine Learning: A Distribution Space Approach

Optimization in Machine Learning: A Distribution Space Approach

URL: http://arxiv.org/abs/2004.08620v1
Date: Sat, 18 Apr 2020 13:38:06 GMT
Title: Optimization in Machine Learning: A Distribution Space Approach
Authors: Yongqiang Cai, Qianxiao Li, Zuowei Shen
Abstract summary: We present the viewpoint that optimization problems in machine learning can often be interpreted as minimizing a convex functional over a function space. We repose such problems via a suitable relaxation as convex optimization problems in the space distributions. We develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space.
Score: 16.038814087205143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training parameters. We derive some simple relationships between the distribution-space problem and the original problem, e.g. a distribution-space solution is at least as good as a solution in the original space. Moreover, we develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space. Consistency of this approximation is established and the numerical efficacy of the proposed algorithm is illustrated on simple examples. In both theory and practice, this formulation provides an alternative approach to large-scale optimization in machine learning.

Related papers

The Distributionally Robust Optimization Model of Sparse Principal Component Analysis [7.695578200868269]
We consider sparse principal component analysis (PCA) under a setting where the underlying probability distribution of the random parameter is uncertain. This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty. We prove that the inner problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold.
arXiv Detail & Related papers (2025-03-04T11:00:08Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Information Theoretical Importance Sampling Clustering [18.248246885248733]
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. We propose an information theoretical importance sampling based approach for clustering problems (ITISC) Experiment results on synthetic datasets and a real-world load forecasting problem validate the effectiveness of the proposed model.
arXiv Detail & Related papers (2023-02-09T03:18:53Z)
Consistent Approximations in Composite Optimization [0.0]
We develop a framework for consistent approximations of optimization problems. The framework is developed for a broad class of optimizations. A programming analysis method illustrates extended nonlinear programming solutions.
arXiv Detail & Related papers (2022-01-13T23:57:08Z)
Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps [0.0]
We propose theoretical frameworks for the analysis and distributed methods with error compensation and local updates. We develop more than 20 new optimization methods, including the first linearly converging Error-pensated and first distributed Local-SGD methods.
arXiv Detail & Related papers (2021-12-20T16:12:54Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)
Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
Communication-efficient distributed eigenspace estimation [31.69089186688224]
We develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix. Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution. We show that our algorithm achieves a similar error rate to that of a centralized estimator.
arXiv Detail & Related papers (2020-09-05T02:11:22Z)
A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z)
Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian. We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.