Effective Proximal Methods for Non-convex Non-smooth Regularized
Learning
- URL: http://arxiv.org/abs/2009.06562v3
- Date: Wed, 21 Oct 2020 14:43:05 GMT
- Title: Effective Proximal Methods for Non-convex Non-smooth Regularized
Learning
- Authors: Guannan Liang, Qianqian Tong, Jiahao Ding, Miao Pan and Jinbo Bi
- Abstract summary: We show that the independent sampling scheme tends to improve performance of the commonly-used uniform sampling scheme.
Our new analysis also derives a speed for the sampling than best one available so far.
- Score: 27.775096437736973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse learning is a very important tool for mining useful information and
patterns from high dimensional data. Non-convex non-smooth regularized learning
problems play essential roles in sparse learning, and have drawn extensive
attentions recently. We design a family of stochastic proximal gradient methods
by applying arbitrary sampling to solve the empirical risk minimization problem
with a non-convex and non-smooth regularizer. These methods draw mini-batches
of training examples according to an arbitrary probability distribution when
computing stochastic gradients. A unified analytic approach is developed to
examine the convergence and computational complexity of these methods, allowing
us to compare the different sampling schemes. We show that the independent
sampling scheme tends to improve performance over the commonly-used uniform
sampling scheme. Our new analysis also derives a tighter bound on convergence
speed for the uniform sampling than the best one available so far. Empirical
evaluations demonstrate that the proposed algorithms converge faster than the
state of the art.
Related papers
- Learning to sample fibers for goodness-of-fit testing [0.0]
We consider the problem of constructing exact goodness-of-fit tests for discrete exponential family models.
We translate the problem into a Markov decision process and demonstrate a reinforcement learning approach for learning good moves' for sampling.
Our algorithm is based on an actor-critic sampling scheme, with provable convergence.
arXiv Detail & Related papers (2024-05-22T19:33:58Z) - An adjoint-free algorithm for conditional nonlinear optimal perturbations (CNOPs) via sampling [5.758073912084367]
We propose a sampling algorithm based on state-of-the-art statistical machine learning techniques to obtain conditional nonlinear optimal perturbations (CNOPs)
The sampling approach directly reduces the gradient to the objective function value (zeroth-order information)
We demonstrate the CNOPs obtained with their spatial patterns, objective values, quantifying computation times, and nonlinear error growth.
arXiv Detail & Related papers (2022-08-01T16:07:22Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Tackling System and Statistical Heterogeneity for Federated Learning
with Adaptive Client Sampling [34.187387951367526]
Federated learning (FL) algorithms usually sample a fraction in each (partial participation) when the number of participants is large.
Recent works have focused on the convergence analysis of FL.
We obtain new convergence bound for FL algorithms with arbitrary client sampling probabilities.
arXiv Detail & Related papers (2021-12-21T14:28:40Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - Minimal Variance Sampling with Provable Guarantees for Fast Training of
Graph Neural Networks [22.618779809748435]
Existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization.
We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance.
We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization.
arXiv Detail & Related papers (2020-06-24T16:49:29Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z) - Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning [20.116219345579154]
Decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples.
The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples.
We will show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning.
arXiv Detail & Related papers (2019-08-23T09:28:21Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.