Robust supervised learning with coordinate gradient descent
- URL: http://arxiv.org/abs/2201.13372v1
- Date: Mon, 31 Jan 2022 17:33:04 GMT
- Title: Robust supervised learning with coordinate gradient descent
- Authors: St\'ephane Ga\"iffas and Ibrahim Merad
- Abstract summary: We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives.
This leads to robust statistical learning methods that have a numerical complexity nearly identical to non-robust ones.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers the problem of supervised learning with linear methods
when both features and labels can be corrupted, either in the form of heavy
tailed data and/or corrupted rows. We introduce a combination of coordinate
gradient descent as a learning algorithm together with robust estimators of the
partial derivatives. This leads to robust statistical learning methods that
have a numerical complexity nearly identical to non-robust ones based on
empirical risk minimization. The main idea is simple: while robust learning
with gradient descent requires the computational cost of robustly estimating
the whole gradient to update all parameters, a parameter can be updated
immediately using a robust estimator of a single partial derivative in
coordinate gradient descent. We prove upper bounds on the generalization error
of the algorithms derived from this idea, that control both the optimization
and statistical errors with and without a strong convexity assumption of the
risk. Finally, we propose an efficient implementation of this approach in a new
python library called linlearn, and demonstrate through extensive numerical
experiments that our approach introduces a new interesting compromise between
robustness, statistical performance and numerical efficiency for this problem.
Related papers
- Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Taylor Learning [0.0]
Empirical risk minimization stands behind most optimization in supervised machine learning.
We introduce a learning algorithm to construct models for real analytic functions using neither gradient descent nor empirical risk minimization.
arXiv Detail & Related papers (2023-05-24T01:10:58Z) - Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix.
We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms
and Theoretical Studies [1.6457778420360534]
A framework of outlier-resistant estimation is introduced to robustify arbitrarily loss function.
A new technique is proposed to alleviate the requirement on starting point such that on regular datasets the number of data reestimations can be substantially reduced.
The obtained estimators, though not necessarily globally or even globally, enjoymax optimality in both low dimensions.
arXiv Detail & Related papers (2021-12-15T20:35:21Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Learning from Similarity-Confidence Data [94.94650350944377]
We investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data.
We propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate.
arXiv Detail & Related papers (2021-02-13T07:31:16Z) - Benefit of deep learning with non-convex noisy gradient descent:
Provable excess risk bound and superiority to kernel methods [41.60125423028092]
We show that any linear estimator can be outperformed by deep learning in a sense of minimax optimal rate.
The excess bounds are so-called fast learning rate which is faster than $O bounds.
arXiv Detail & Related papers (2020-12-06T09:22:16Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.