Why Cannot Neural Networks Master Extrapolation? Insights from Physical Laws
- URL: http://arxiv.org/abs/2510.04102v1
- Date: Sun, 05 Oct 2025 09:07:25 GMT
- Title: Why Cannot Neural Networks Master Extrapolation? Insights from Physical Laws
- Authors: Ramzi Dakhmouche, Hossein Gorji,
- Abstract summary: Motivated by the remarkable success of Foundation Models (FMs) in language modeling, there has been growing interest in developing FMs for time series prediction.<n>This work identifies and formalizes a fundamental property characterizing the ability of statistical learning models to predict more accurately outside of their training domain.<n>In addition to a theoretical analysis, we present empirical results showcasing the implications of this property on current deep learning architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the remarkable success of Foundation Models (FMs) in language modeling, there has been growing interest in developing FMs for time series prediction, given the transformative power such models hold for science and engineering. This culminated in significant success of FMs in short-range forecasting settings. However, extrapolation or long-range forecasting remains elusive for FMs, which struggle to outperform even simple baselines. This contrasts with physical laws which have strong extrapolation properties, and raises the question of the fundamental difference between the structure of neural networks and physical laws. In this work, we identify and formalize a fundamental property characterizing the ability of statistical learning models to predict more accurately outside of their training domain, hence explaining performance deterioration for deep learning models in extrapolation settings. In addition to a theoretical analysis, we present empirical results showcasing the implications of this property on current deep learning architectures. Our results not only clarify the root causes of the extrapolation gap but also suggest directions for designing next-generation forecasting models capable of mastering extrapolation.
Related papers
- How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z) - From Black-box to Causal-box: Towards Building More Interpretable Models [57.23201263629627]
We introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models.<n>We derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query.
arXiv Detail & Related papers (2025-10-24T20:03:18Z) - Understanding the Implicit Biases of Design Choices for Time Series Foundation Models [90.894232610821]
Time series foundation models (TSFMs) are a class of potentially powerful, general-purpose tools for time series forecasting and related temporal tasks.<n>Their behavior is strongly shaped by subtle inductive biases in their design.<n>We show how these biases can be intuitive or very counterintuitive, depending on properties of the model and data.
arXiv Detail & Related papers (2025-10-22T04:42:35Z) - Variational Graph Convolutional Neural Networks [72.67088029389764]
Uncertainty can help improve the explainability of Graph Convolutional Networks.<n>Uncertainty can also be used in critical applications to verify the results of the model.
arXiv Detail & Related papers (2025-07-02T13:28:37Z) - Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods [11.695512384798299]
Supervised fine-tuning is the dominant approach for adapting foundation models to specialized tasks.<n>In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue.<n>We observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself.
arXiv Detail & Related papers (2025-06-02T17:23:16Z) - Investigating Compositional Reasoning in Time Series Foundation Models [16.0792886386044]
We define compositional reasoning in forecasting and distinguish it from in-distribution generalization.<n>We find that patch-based Transformers have the best reasoning performance.<n>In some zero-shot out-of-distribution scenarios, these models can outperform moving average and exponential smoothing statistical baselines trained on in-distribution data.
arXiv Detail & Related papers (2025-02-09T21:21:55Z) - AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality
Prediction [40.58819011476455]
This paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet)
We leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks.
Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios.
arXiv Detail & Related papers (2024-02-06T07:55:54Z) - On some limitations of data-driven weather forecasting models [0.0]
We examine some aspects of the forecasts produced by an exemplar of the current generation of ML models, Pangu-Weather.
The main conclusion is that Pangu-Weather forecasts, and possibly those of similar ML models, do not have the fidelity and physical consistency of physics-based models.
arXiv Detail & Related papers (2023-09-15T15:21:57Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - Graph Neural Networks for Improved El Ni\~no Forecasting [0.009620910657090186]
We propose an application of Graph Neural Networks (GNN) to forecast El Nino-Southern Oscillation (ENSO) at long lead times.
Preliminary results are promising and outperform state-of-the-art systems for projections 1 and 3 months ahead.
arXiv Detail & Related papers (2020-12-02T23:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.