Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior
- URL: http://arxiv.org/abs/2306.00258v1
- Date: Thu, 1 Jun 2023 00:32:59 GMT
- Title: Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior
- Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji,
Dmitriy Morozov, Michael Mahoney, Amir Gholami
- Abstract summary: We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
- Score: 32.74388989649232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained machine learning (ML) models have shown great performance for a
wide range of applications, in particular in natural language processing (NLP)
and computer vision (CV). Here, we study how pre-training could be used for
scientific machine learning (SciML) applications, specifically in the context
of transfer learning. We study the transfer behavior of these models as (i) the
pre-trained model size is scaled, (ii) the downstream training dataset size is
scaled, (iii) the physics parameters are systematically pushed out of
distribution, and (iv) how a single model pre-trained on a mixture of different
physics problems can be adapted to various downstream applications. We find
that-when fine-tuned appropriately-transfer learning can help reach desired
accuracy levels with orders of magnitude fewer downstream examples (across
different tasks that can even be out-of-distribution) than training from
scratch, with consistent behavior across a wide range of downstream examples.
We also find that fine-tuning these models yields more performance gains as
model size increases, compared to training from scratch on new downstream
tasks. These results hold for a broad range of PDE learning tasks. All in all,
our results demonstrate the potential of the "pre-train and fine-tune" paradigm
for SciML problems, demonstrating a path towards building SciML foundation
models. We open-source our code for reproducibility.
Related papers
- Physics Informed Machine Learning (PIML) methods for estimating the remaining useful lifetime (RUL) of aircraft engines [0.0]
This paper is aimed at using the newly developing field of physics informed machine learning (PIML) to develop models for predicting the remaining useful lifetime (RUL) aircraft engines.
We consider the well-known benchmark NASA Commercial Modular Aero-Propulsion System Simulation System (C-MAPSS) data as the main data for this paper.
C-MAPSS is a well-studied dataset with much existing work in the literature that address RUL prediction with classical and deep learning methods.
arXiv Detail & Related papers (2024-06-21T19:55:34Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 80 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models [40.21274215353816]
We introduce the Learngene framework, which learns one compact part termed as learngene from a large well-trained model.
We then expand these learngene layers containing stage information at their corresponding stage to initialize models of variable depths.
Experiments on ImageNet-1K demonstrate that SWS achieves consistent better performance compared to many models trained from scratch.
arXiv Detail & Related papers (2024-04-25T06:04:34Z) - Pretraining Billion-scale Geospatial Foundational Models on Frontier [0.16492989697868893]
Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning.
We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data.
Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy.
arXiv Detail & Related papers (2024-04-17T19:16:32Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Supervised Learning in the Presence of Concept Drift: A modelling
framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments.
We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.