Related papers: Gradients are Not All You Need

Gradients are Not All You Need

URL: http://arxiv.org/abs/2111.05803v1
Date: Wed, 10 Nov 2021 16:51:04 GMT
Title: Gradients are Not All You Need
Authors: Luke Metz, C. Daniel Freeman, Samuel S. Schoenholz, Tal Kachman
Abstract summary: We discuss a common chaos based failure mode which appears in a variety of differentiable circumstances. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms.
Score: 28.29420710601308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms.

Related papers

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives. We exploit their second-order information via their empirical Fisherssian matrices. We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z)
Machine Learning for predicting chaotic systems [0.0]
We show that well-tuned simple methods, as well as untuned baseline methods, often outperform state-of-the-art deep learning models. These findings underscore the importance of matching prediction methods to data characteristics and available computational resources.
arXiv Detail & Related papers (2024-07-29T16:34:47Z)
ODE Discovery for Longitudinal Heterogeneous Treatment Effects Inference [69.24516189971929]
In this paper, we introduce a new type of solution in the longitudinal setting: a closed-form ordinary differential equation (ODE) While we still rely on continuous optimization to learn an ODE, the resulting inference machine is no longer a neural network.
arXiv Detail & Related papers (2024-03-16T02:07:45Z)
Hierarchical deep learning-based adaptive time-stepping scheme for multiscale simulations [0.0]
This study proposes a new method for simulating multiscale problems using deep neural networks. By leveraging the hierarchical learning of neural network time steppers, the method adapts time steps to approximate dynamical system flow maps across timescales. This approach achieves state-of-the-art performance in less computational time compared to fixed-step neural network solvers.
arXiv Detail & Related papers (2023-11-10T09:47:58Z)
Mechanic: A Learning Rate Tuner [52.4242550204696]
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call textscmechanic. We rigorously evaluate textscmechanic on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms.
arXiv Detail & Related papers (2023-05-31T19:32:43Z)
Recent Developments in Machine Learning Methods for Stochastic Control and Games [3.3993877661368757]
Recently, computational methods based on machine learning have been developed for solving control problems and games. We focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and control and games.
arXiv Detail & Related papers (2023-03-17T21:53:07Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications. Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular. Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z)
Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations [11.4219428942199]
In the context of reinforcement learning for control, rigid body physics simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. We present a novel algorithm, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a set of challenging nonlinear control problems.
arXiv Detail & Related papers (2022-03-06T02:28:46Z)
Physical Gradients for Deep Learning [101.36788327318669]
We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes. We propose a novel hybrid training approach that combines higher-order optimization methods with machine learning techniques.
arXiv Detail & Related papers (2021-09-30T12:14:31Z)
Comparison of Update and Genetic Training Algorithms in a Memristor Crossbar Perceptron [4.649999862713524]
We investigate whether certain training algorithms may be more resilient to particular hardware failure modes. We implement two training algorithms -- a local update scheme and a genetic algorithm -- in a simulated memristor crossbar. We demonstrate that there is a clear distinction between the two algorithms in several measures of the rate of failure to train.
arXiv Detail & Related papers (2020-12-10T23:48:58Z)
Learning to Rank Learning Curves [15.976034696758148]
We present a new method that saves computational budget by terminating poor configurations early on in the training. We show that our model is able to effectively rank learning curves without having to observe many or very long learning curves.
arXiv Detail & Related papers (2020-06-05T10:49:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.