Related papers: Scaling Laws and Symmetry, Evidence from Neural Force Fields

Scaling Laws and Symmetry, Evidence from Neural Force Fields

URL: http://arxiv.org/abs/2510.09768v1
Date: Fri, 10 Oct 2025 18:22:00 GMT
Title: Scaling Laws and Symmetry, Evidence from Neural Force Fields
Authors: Khang Ngo, Siamak Ravanbakhsh,
Abstract summary: We show a clear power-law scaling behaviour with respect to data, parameters and compute with architecture-dependent exponents''<n>In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models.<n>Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture.
Score: 14.109815254143205
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.

Related papers

Learning and extrapolating scale-invariant processes [3.331543293568139]
We tackle the question of how and to which extent can one regress scale-free processes, i.e. processes displaying power law behavior, like earthquakes or avalanches?<n>We are interested in predicting the large ones, i.e. rare events in the training set which therefore require extrapolation capabilities of the model.
arXiv Detail & Related papers (2026-01-21T09:35:44Z)
SEAL - A Symmetry EncourAging Loss for High Energy Physics [0.005211875900848231]
Building machine learning models that explicitly respect symmetries can be difficult due to the dedicated components required.<n>We introduce soft constraints that allow the model to decide the importance of added symmetries during the learning process instead of enforcing exact symmetries.
arXiv Detail & Related papers (2025-11-03T19:00:13Z)
Complexity Scaling Laws for Neural Models using Combinatorial Optimization [3.4585775092874163]
We develop scaling laws based on problem complexity.<n>We analyze two fundamental complexity measures: solution space size and representation space size.<n>We show that optimization promotes smooth cost trends, and therefore meaningful scaling laws can be obtained even in the absence of an interpretable loss.
arXiv Detail & Related papers (2025-06-15T18:20:35Z)
Probing the effects of broken symmetries in machine learning [0.0]
We show that non-symmetric models can learn symmetries from data, and that doing so can even be beneficial for the accuracy of the model. We focus specifically on physical observables that are likely to be affected -- directly or indirectly -- by symmetry breaking, finding negligible consequences when the model is used in an interpolative, bulk, regime.
arXiv Detail & Related papers (2024-06-25T17:34:09Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws. We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z)
Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z)
The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities. For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Explaining Neural Scaling Laws [17.115592382420626]
Population loss of trained deep neural networks often follows precise power-law scaling relations. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size.
arXiv Detail & Related papers (2021-02-12T18:57:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.