Related papers: Active Labeling: Streaming Stochastic Gradients

Active Labeling: Streaming Stochastic Gradients

URL: http://arxiv.org/abs/2205.13255v1
Date: Thu, 26 May 2022 09:49:16 GMT
Title: Active Labeling: Streaming Stochastic Gradients
Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi
Abstract summary: We formalize the "active labeling" problem, which generalizes active learning based on partial supervision. We provide a streaming technique that minimizes the ratio of generalization error over number of samples.
Score: 91.76135191049232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which generalizes active learning based on partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over number of samples. We illustrate our technique in depth for robust regression.

Related papers

Regression under demographic parity constraints via unlabeled post-processing [5.762345156477737]
We present a general-purpose post-processing algorithm that generates predictions that meet the demographic parity. Unlike prior methods, our approach is fully theory-driven. We require precise control over the gradient norm of the convex function. Our algorithm is backed by finite-sample analysis and post-processing bounds, with experimental results validating our theoretical findings.
arXiv Detail & Related papers (2024-07-22T08:11:58Z)
Almost sure convergence rates of stochastic gradient methods under gradient domination [2.96614015844317]
Global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. We prove almost sure convergence rates $f(X_n)-f*in obig( n-frac14beta-1+epsilonbig)$ of the last iterate for gradient descent. We show how to apply our results to the training task in both supervised and reinforcement learning.
arXiv Detail & Related papers (2024-05-22T12:40:57Z)
One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm. We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
From Weakly Supervised Learning to Active Learning [1.52292571922932]
This thesis is motivated by the question: can we derive a more generic framework than the one of supervised learning? We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an optimistic'' function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels.
arXiv Detail & Related papers (2022-09-23T14:55:43Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z)
Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification [3.158346511479111]
We propose a class of STEs with certain monotonicity, and consider their applications to the training of a two-linear-layer network with quantized activation functions. We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum.
arXiv Detail & Related papers (2020-11-23T07:50:09Z)
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape. We define a prototype process for which can be extended to a continuous-dimensional gradient flow. In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.