Related papers: Choosing the Sample with Lowest Loss makes SGD Robust

Choosing the Sample with Lowest Loss makes SGD Robust

URL: http://arxiv.org/abs/2001.03316v1
Date: Fri, 10 Jan 2020 05:39:17 GMT
Title: Choosing the Sample with Lowest Loss makes SGD Robust
Authors: Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi
Abstract summary: We propose a simple variant of the simple gradient descent (SGD) method in each step. Vanilla represents a new algorithm that is however effectively minimizing a non-current sum with the smallest loss. Our theoretical analysis of this idea for ML problems is backed up with small-scale neural network experiments.
Score: 19.08973384659313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to k = 1, i.e. no choice; k >= 2 represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with linear regression and small-scale neural network experiments

Related papers

A semi-supervised learning using over-parameterized regression [0.0]
Semi-supervised learning (SSL) is an important theme in machine learning. In this paper, we consider a method of incorporating information on unlabeled samples into kernel functions.
arXiv Detail & Related papers (2024-09-06T03:05:35Z)
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD [29.162265194920522]
This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Gradient Descent (SGD) We show that overfactorization can only enhance convergence by a constant factor within this problem class. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of SGDity may be minimal in this scenario.
arXiv Detail & Related papers (2023-05-29T14:40:56Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model. Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Stochastic Gradient Descent Meets Distribution Regression [0.0]
gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems. We focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from probability measures to real-valued responses.
arXiv Detail & Related papers (2020-10-24T09:03:00Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.