Structured Sparsity Inducing Adaptive Optimizers for Deep Learning
- URL: http://arxiv.org/abs/2102.03869v1
- Date: Sun, 7 Feb 2021 18:06:23 GMT
- Title: Structured Sparsity Inducing Adaptive Optimizers for Deep Learning
- Authors: Tristan Deleu, Yoshua Bengio
- Abstract summary: In this paper, we derive the weighted proximal operator, which is a necessary component of proximal gradient methods.
We show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns.
- Score: 94.23102887731417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The parameters of a neural network are naturally organized in groups, some of
which might not contribute to its overall performance. To prune out unimportant
groups of parameters, we can include some non-differentiable penalty to the
objective function, and minimize it using proximal gradient methods. In this
paper, we derive the weighted proximal operator, which is a necessary component
of these proximal methods, of two structured sparsity inducing penalties.
Moreover, they can be approximated efficiently with a numerical solver, and
despite this approximation, we prove that existing convergence guarantees are
preserved when these operators are integrated as part of a generic adaptive
proximal method. Finally, we show that this adaptive method, together with the
weighted proximal operators derived here, is indeed capable of finding
solutions with structure in their sparsity patterns, on representative examples
from computer vision and natural language processing.
Related papers
- Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties.
We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Unnatural Algorithms in Machine Learning [0.0]
We show that optimization algorithms with this property can be viewed as discrete approximations of natural gradient descent.
We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms.
arXiv Detail & Related papers (2023-12-07T22:43:37Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Contraction-Guided Adaptive Partitioning for Reachability Analysis of
Neural Network Controlled Systems [5.359060261460183]
We present a contraction-guided adaptive partitioning algorithm for improving interval-valued reachable set estimates in a nonlinear feedback loop.
By leveraging a decoupling of the neural network verification step and reachability partitioning layers, the algorithm can provide accuracy improvements for little computational cost.
We report a sizable improvement in the accuracy of reachable set estimation in a fraction of the runtime as compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-04-07T14:43:21Z) - Lifted Bregman Training of Neural Networks [28.03724379169264]
We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions.
This formulation is based on Bregman and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions.
We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding.
arXiv Detail & Related papers (2022-08-18T11:12:52Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - A Stochastic Bundle Method for Interpolating Networks [18.313879914379008]
We propose a novel method for training deep neural networks that are capable of driving the empirical loss to zero.
At each iteration our method constructs a maximum linear approximation, known as the bundle of the objective learning approximation.
arXiv Detail & Related papers (2022-01-29T23:02:30Z) - The Advantage of Conditional Meta-Learning for Biased Regularization and
Fine-Tuning [50.21341246243422]
Biased regularization and fine-tuning are two recent meta-learning approaches.
We propose conditional meta-learning, inferring a conditioning function mapping task's side information into a meta- parameter vector.
We then propose a convex meta-algorithm providing a comparable advantage also in practice.
arXiv Detail & Related papers (2020-08-25T07:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.