From calibration to parameter learning: Harnessing the scaling effects
of big data in geoscientific modeling
- URL: http://arxiv.org/abs/2007.15751v6
- Date: Wed, 23 Mar 2022 00:54:33 GMT
- Title: From calibration to parameter learning: Harnessing the scaling effects
of big data in geoscientific modeling
- Authors: Wen-Ping Tsai, Dapeng Feng, Ming Pan, Hylke Beck, Kathryn Lawson, Yuan
Yang, Jiangtao Liu, and Chaopeng Shen
- Abstract summary: We propose a differentiable parameter learning framework that efficiently learns a global mapping between inputs and parameters.
As training data increases, dPL achieves better performance, more physical coherence, and better generalizability.
We demonstrate examples that learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods.
- Score: 2.9897531698031403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The behaviors and skills of models in many geosciences (e.g., hydrology and
ecosystem sciences) strongly depend on spatially-varying parameters that need
calibration. A well-calibrated model can reasonably propagate information from
observations to unobserved variables via model physics, but traditional
calibration is highly inefficient and results in non-unique solutions. Here we
propose a novel differentiable parameter learning (dPL) framework that
efficiently learns a global mapping between inputs (and optionally responses)
and parameters. Crucially, dPL exhibits beneficial scaling curves not
previously demonstrated to geoscientists: as training data increases, dPL
achieves better performance, more physical coherence, and better
generalizability (across space and uncalibrated variables), all with
orders-of-magnitude lower computational cost. We demonstrate examples that
learned from soil moisture and streamflow, where dPL drastically outperformed
existing evolutionary and regionalization methods, or required only ~12.5% of
the training data to achieve similar performance. The generic scheme promotes
the integration of deep learning and process-based models, without mandating
reimplementation.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - CGLearn: Consistent Gradient-Based Learning for Out-of-Distribution Generalization [0.7366405857677226]
In this work, we introduce a simple yet powerful approach, CGLearn, which relies on the agreement of gradients across various environments.
Our proposed method demonstrates superior performance compared to state-of-the-art methods in both linear and nonlinear settings.
Comprehensive experiments on both synthetic and real-world datasets highlight its effectiveness in diverse scenarios.
arXiv Detail & Related papers (2024-11-09T02:36:39Z) - Beware of Calibration Data for Pruning Large Language Models [41.1689082093302]
Post-training pruning is a promising method that does not require resource-intensive iterative training.
We show that the effects of calibration data even value more than designing advanced pruning strategies.
Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance.
arXiv Detail & Related papers (2024-10-23T09:36:21Z) - AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs [61.13296177652599]
This paper demonstrates that the optimal composition of training data from different domains is scale-dependent.
We introduce *AutoScale*, a novel, practical approach for optimizing data compositions at potentially large training data scales.
Our evaluation on GPT-2 Large and BERT pre-training demonstrates *AutoScale*'s effectiveness in improving training convergence and downstream performance.
arXiv Detail & Related papers (2024-07-29T17:06:30Z) - Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Scaling Laws for Sparsely-Connected Foundation Models [70.41266138010657]
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets.
We identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data.
arXiv Detail & Related papers (2023-09-15T16:29:27Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Differentiable modeling to unify machine learning and physical models
and advance Geosciences [38.92849886903847]
We outline the concepts, applicability, and significance of differentiable geoscientific modeling (DG)
"Differentiable" refers to accurately and efficiently calculating gradients with respect to model variables.
Preliminary evidence suggests DG offers better interpretability and causality than Machine Learning.
arXiv Detail & Related papers (2023-01-10T15:24:14Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Data-driven super-parameterization using deep learning: Experimentation
with multi-scale Lorenz 96 systems and transfer-learning [0.0]
We propose a data-driven SP (DD-SP) to make weather/climate modeling computationally affordable.
With the same computational cost, DD-SP substantially outperforms LR, and is better than DD-P, particularly when scale separation is lacking.
DD-SP is much cheaper than SP, yet its accuracy is the same in reproducing long-term statistics and often comparable in short-term forecasting.
arXiv Detail & Related papers (2020-02-25T20:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.