Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics
- URL: http://arxiv.org/abs/2402.12971v2
- Date: Thu, 10 Oct 2024 16:27:52 GMT
- Title: Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics
- Authors: Bjoern List, Li-Wei Chen, Kartik Bali, Nils Thuerey,
- Abstract summary: Unrolling training trajectories over time influences the inference accuracy of neural network-augmented physics simulators.
We present study across physical systems, network sizes and architectures, training setups, and test scenarios.
- Score: 22.40149186064481
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Unrolling training trajectories over time strongly influences the inference accuracy of neural network-augmented physics simulators. We analyze this in three variants of training neural time-steppers. In addition to one-step setups and fully differentiable unrolling, we include a third, less widely used variant: unrolling without temporal gradients. Comparing networks trained with these three modalities disentangles the two dominant effects of unrolling, training distribution shift and long-term gradients. We present detailed study across physical systems, network sizes and architectures, training setups, and test scenarios. It also encompasses two simulation modes: In prediction setups, we rely solely on neural networks to compute a trajectory. In contrast, correction setups include a numerical solver that is supported by a neural network. Spanning these variations, our study provides the empirical basis for our main findings: Non-differentiable but unrolled training with a numerical solver in a correction setup can yield substantial improvements over a fully differentiable prediction setup not utilizing this solver. The accuracy of models trained in a fully differentiable setup differs compared to their non-differentiable counterparts. Differentiable ones perform best in a comparison among correction networks as well as among prediction setups. For both, the accuracy of non-differentiable unrolling comes close. Furthermore, we show that these behaviors are invariant to the physical system, the network architecture and size, and the numerical scheme. These results motivate integrating non-differentiable numerical simulators into training setups even if full differentiability is unavailable. We show the convergence rate of common architectures to be low compared to numerical algorithms. This motivates correction setups combining neural and numerical parts which utilize benefits of both.
Related papers
- PRDP: Progressively Refined Differentiable Physics [18.076285588021868]
We show that full accuracy of the network is achievable through physics significantly coarser than fully converged solvers.
We propose Progressively Refined Differentiable Physics (PRDP), an approach that identifies the level of physics refinement sufficient for full training accuracy.
arXiv Detail & Related papers (2025-02-26T22:56:56Z) - ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets.
We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets.
We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z) - From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers [10.784222655465264]
Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem and a weighted least squares optimization problem.
Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains.
We show that training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training.
arXiv Detail & Related papers (2024-06-12T00:41:25Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - On the (Non-)Robustness of Two-Layer Neural Networks in Different
Learning Regimes [27.156666384752548]
Neural networks are highly sensitive to adversarial examples.
We study robustness and generalization in different scenarios.
We show how linearized lazy training regimes can worsen robustness.
arXiv Detail & Related papers (2022-03-22T16:40:52Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Finite Difference Neural Networks: Fast Prediction of Partial
Differential Equations [5.575293536755126]
We propose a novel neural network framework, finite difference neural networks (FDNet), to learn partial differential equations from data.
Specifically, our proposed finite difference inspired network is designed to learn the underlying governing partial differential equations from trajectory data.
arXiv Detail & Related papers (2020-06-02T19:17:58Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z) - Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks.
We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias.
A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.