Stabilizing Backpropagation Through Time to Learn Complex Physics
- URL: http://arxiv.org/abs/2405.02041v1
- Date: Fri, 3 May 2024 12:20:08 GMT
- Title: Stabilizing Backpropagation Through Time to Learn Complex Physics
- Authors: Patrick Schnell, Nils Thuerey,
- Abstract summary: In physics simulations, backpropagating feedback is crucial to acquiring temporally coherent behavior.
The alternative vector field we propose follows from two principles: physics simulators have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged.
Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations.
- Score: 21.850601375335074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.
Related papers
- Accelerated Training through Iterative Gradient Propagation Along the Residual Path [46.577761606415805]
Highway backpropagation is a parallelizable iterative algorithm that approximates backpropagation.
It is adaptable to a diverse set of common architectures, ranging from ResNets and Transformers to recurrent neural networks.
arXiv Detail & Related papers (2025-01-28T17:14:42Z) - ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think [22.333776599402754]
We introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting.<n>Our results show that forward passes alone can be sufficient to mitigate forgetting.<n>We propose new enhancements that further improve forgetting resistance using only forward passes.
arXiv Detail & Related papers (2025-01-02T04:10:17Z) - ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks [25.333488397742432]
We propose the ConFIG method to improve learning the physics-informed Neural Networks (PINNs) task.
It provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient.
It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels.
arXiv Detail & Related papers (2024-08-20T18:00:20Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges.
We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems.
Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z) - Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
Tom is a novel variant of Adam that takes into account the trend observed for the gradients in the landscape in the loss traversed by the neural network.
Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence.
arXiv Detail & Related papers (2021-09-07T20:19:40Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Deep Feedback Inverse Problem Solver [141.26041463617963]
We present an efficient, effective, and generic approach towards solving inverse problems.
We leverage the feedback signal provided by the forward process and learn an iterative update model.
Our approach does not have any restrictions on the forward process; it does not require any prior knowledge either.
arXiv Detail & Related papers (2021-01-19T16:49:06Z) - Cost Function Unrolling in Unsupervised Optical Flow [6.656273171776146]
This work focuses on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions.
We derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling.
arXiv Detail & Related papers (2020-11-30T14:10:03Z) - Activation Relaxation: A Local Dynamical Approximation to
Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system.
Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z) - Neural Non-Rigid Tracking [26.41847163649205]
We introduce a novel, end-to-end learnable, differentiable non-rigid tracker.
We employ a convolutional neural network to predict dense correspondences and their confidences.
Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance.
arXiv Detail & Related papers (2020-06-23T18:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.