A Multilevel Approach to Training
- URL: http://arxiv.org/abs/2006.15602v1
- Date: Sun, 28 Jun 2020 13:34:48 GMT
- Title: A Multilevel Approach to Training
- Authors: Vanessa Braglia and Alena Kopani\v{c}\'akov\'a and Rolf Krause
- Abstract summary: We propose a novel training method based on nonlinear multilevel techniques, commonly used for solving discretized large scale partial differential equations.
Our multilevel training method constructs a multilevel hierarchy by reducing the number of samples.
The training of the original model is then enhanced by internally training surrogate models constructed with fewer samples.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel training method based on nonlinear multilevel minimization
techniques, commonly used for solving discretized large scale partial
differential equations. Our multilevel training method constructs a multilevel
hierarchy by reducing the number of samples. The training of the original model
is then enhanced by internally training surrogate models constructed with fewer
samples. We construct the surrogate models using first-order consistency
approach. This gives rise to surrogate models, whose gradients are stochastic
estimators of the full gradient, but with reduced variance compared to standard
stochastic gradient estimators. We illustrate the convergence behavior of the
proposed multilevel method to machine learning applications based on logistic
regression. A comparison with subsampled Newton's and variance reduction
methods demonstrate the efficiency of our multilevel method.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Predicting Ordinary Differential Equations with Transformers [65.07437364102931]
We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory.
Our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing law of a new observed solution in a few forward passes of the model.
arXiv Detail & Related papers (2023-07-24T08:46:12Z) - Aiming towards the minimizers: fast convergence of SGD for
overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method.
All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence.
We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - GuideBP: Guiding Backpropagation Through Weaker Pathways of Parallel
Logits [6.764324841419295]
The proposed approach guides the gradients of backpropagation along weakest concept representations.
A weakness scores defines the class specific performance of individual pathways which is then used to create a logit.
The proposed approach has been shown to perform better than traditional column merging techniques.
arXiv Detail & Related papers (2021-04-23T14:14:00Z) - Storchastic: A Framework for General Stochastic Automatic
Differentiation [9.34612743192798]
We introduce Storchastic, a new framework for automatic differentiation of graphs.
Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step.
Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.
arXiv Detail & Related papers (2021-04-01T12:19:54Z) - A Generalized Stacking for Implementing Ensembles of Gradient Boosting
Machines [5.482532589225552]
An approach for constructing ensembles of gradient boosting models is proposed.
It is shown that the proposed approach can be simply extended on arbitrary differentiable combination models.
arXiv Detail & Related papers (2020-10-12T21:05:45Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.