Towards Foundation Models for Scientific Machine Learning:
  Characterizing Scaling and Transfer Behavior
        - URL: http://arxiv.org/abs/2306.00258v1
- Date: Thu, 1 Jun 2023 00:32:59 GMT
- Title: Towards Foundation Models for Scientific Machine Learning:
  Characterizing Scaling and Transfer Behavior
- Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji,
  Dmitriy Morozov, Michael Mahoney, Amir Gholami
- Abstract summary: We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
- Score: 32.74388989649232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Pre-trained machine learning (ML) models have shown great performance for a
wide range of applications, in particular in natural language processing (NLP)
and computer vision (CV). Here, we study how pre-training could be used for
scientific machine learning (SciML) applications, specifically in the context
of transfer learning. We study the transfer behavior of these models as (i) the
pre-trained model size is scaled, (ii) the downstream training dataset size is
scaled, (iii) the physics parameters are systematically pushed out of
distribution, and (iv) how a single model pre-trained on a mixture of different
physics problems can be adapted to various downstream applications. We find
that-when fine-tuned appropriately-transfer learning can help reach desired
accuracy levels with orders of magnitude fewer downstream examples (across
different tasks that can even be out-of-distribution) than training from
scratch, with consistent behavior across a wide range of downstream examples.
We also find that fine-tuning these models yields more performance gains as
model size increases, compared to training from scratch on new downstream
tasks. These results hold for a broad range of PDE learning tasks. All in all,
our results demonstrate the potential of the "pre-train and fine-tune" paradigm
for SciML problems, demonstrating a path towards building SciML foundation
models. We open-source our code for reproducibility.
 
      
        Related papers
        - Action-Minimization Meets Generative Modeling: Efficient Transition Path   Sampling with the Onsager-Machlup Functional [2.010573982216398]
 Current machine learning approaches use expensive, task-specific, and data-free training procedures.
We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways.
Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale.
 arXiv  Detail & Related papers  (2025-04-25T17:17:17Z)
- Transfer learning in Scalable Graph Neural Network for Improved Physical   Simulation [37.1565271299621]
 We introduce a pre-training and transfer learning paradigm for graph network simulators.
We show that our proposed transfer learning methods allow the model to perform even better when fine-tuned with small amounts of training data.
 arXiv  Detail & Related papers  (2025-02-07T08:18:23Z)
- DoMINO: A Decomposable Multi-scale Iterative Neural Operator for   Modeling Large Scale Engineering Simulations [2.300471499347615]
 DoMINO is a point cloudbased machine learning model that uses local geometric information to predict flow fields on discrete points.
DoMINO is validated for the automotive aerodynamics use case using the DrivAerML dataset.
 arXiv  Detail & Related papers  (2025-01-23T03:28:10Z)
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
 We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
 arXiv  Detail & Related papers  (2024-10-28T13:48:43Z)
- Physics Informed Machine Learning (PIML) methods for estimating the   remaining useful lifetime (RUL) of aircraft engines [0.0]
 This paper is aimed at using the newly developing field of physics informed machine learning (PIML) to develop models for predicting the remaining useful lifetime (RUL) aircraft engines.
We consider the well-known benchmark NASA Commercial Modular Aero-Propulsion System Simulation System (C-MAPSS) data as the main data for this paper.
C-MAPSS is a well-studied dataset with much existing work in the literature that address RUL prediction with classical and deep learning methods.
 arXiv  Detail & Related papers  (2024-06-21T19:55:34Z)
- Pretraining Billion-scale Geospatial Foundational Models on Frontier [0.16492989697868893]
 Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning.
We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data.
Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy.
 arXiv  Detail & Related papers  (2024-04-17T19:16:32Z)
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
 D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
 arXiv  Detail & Related papers  (2024-02-28T08:34:23Z)
- An Emulator for Fine-Tuning Large Language Models using Small Language
  Models [91.02498576056057]
 We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
 arXiv  Detail & Related papers  (2023-10-19T17:57:16Z)
- CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
 We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
 arXiv  Detail & Related papers  (2023-05-03T17:55:25Z)
- eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
 We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
 arXiv  Detail & Related papers  (2023-03-20T19:20:34Z)
- Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
 There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
 arXiv  Detail & Related papers  (2021-06-17T17:26:31Z)
- Supervised Learning in the Presence of Concept Drift: A modelling
  framework [5.22609266390809]
 We present a modelling framework for the investigation of supervised learning in non-stationary environments.
We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
 arXiv  Detail & Related papers  (2020-05-21T09:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.