Related papers: Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics

Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics

URL: http://arxiv.org/abs/2402.12971v2
Date: Thu, 10 Oct 2024 16:27:52 GMT
Title: Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics
Authors: Bjoern List, Li-Wei Chen, Kartik Bali, Nils Thuerey,
Abstract summary: Unrolling training trajectories over time influences the inference accuracy of neural network-augmented physics simulators. We present study across physical systems, network sizes and architectures, training setups, and test scenarios.
Score: 22.40149186064481
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Unrolling training trajectories over time strongly influences the inference accuracy of neural network-augmented physics simulators. We analyze this in three variants of training neural time-steppers. In addition to one-step setups and fully differentiable unrolling, we include a third, less widely used variant: unrolling without temporal gradients. Comparing networks trained with these three modalities disentangles the two dominant effects of unrolling, training distribution shift and long-term gradients. We present detailed study across physical systems, network sizes and architectures, training setups, and test scenarios. It also encompasses two simulation modes: In prediction setups, we rely solely on neural networks to compute a trajectory. In contrast, correction setups include a numerical solver that is supported by a neural network. Spanning these variations, our study provides the empirical basis for our main findings: Non-differentiable but unrolled training with a numerical solver in a correction setup can yield substantial improvements over a fully differentiable prediction setup not utilizing this solver. The accuracy of models trained in a fully differentiable setup differs compared to their non-differentiable counterparts. Differentiable ones perform best in a comparison among correction networks as well as among prediction setups. For both, the accuracy of non-differentiable unrolling comes close. Furthermore, we show that these behaviors are invariant to the physical system, the network architecture and size, and the numerical scheme. These results motivate integrating non-differentiable numerical simulators into training setups even if full differentiability is unavailable. We show the convergence rate of common architectures to be low compared to numerical algorithms. This motivates correction setups combining neural and numerical parts which utilize benefits of both.

Related papers

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z)
A Stable Whitening Optimizer for Efficient Neural Network Training [101.89246340672246]
Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method.<n>First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods.<n>Second, we adapt a shape-aware scaling to enable learning rate transfer across network width.<n>Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning.
arXiv Detail & Related papers (2025-06-08T18:43:31Z)
PRDP: Progressively Refined Differentiable Physics [18.076285588021868]
We show that full accuracy of the network is achievable through physics significantly coarser than fully converged solvers. We propose Progressively Refined Differentiable Physics (PRDP), an approach that identifies the level of physics refinement sufficient for full training accuracy.
arXiv Detail & Related papers (2025-02-26T22:56:56Z)
ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers [10.784222655465264]
Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem and a weighted least squares optimization problem. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. We show that training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training.
arXiv Detail & Related papers (2024-06-12T00:41:25Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z)
On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes [27.156666384752548]
Neural networks are highly sensitive to adversarial examples. We study robustness and generalization in different scenarios. We show how linearized lazy training regimes can worsen robustness.
arXiv Detail & Related papers (2022-03-22T16:40:52Z)
What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training. Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Finite Difference Neural Networks: Fast Prediction of Partial Differential Equations [5.575293536755126]
We propose a novel neural network framework, finite difference neural networks (FDNet), to learn partial differential equations from data. Specifically, our proposed finite difference inspired network is designed to learn the underlying governing partial differential equations from trajectory data.
arXiv Detail & Related papers (2020-06-02T19:17:58Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.