Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation
Models
- URL: http://arxiv.org/abs/2403.07066v1
- Date: Mon, 11 Mar 2024 18:00:47 GMT
- Title: Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation
Models
- Authors: Philip Harris, Michael Kagan, Jeffrey Krupa, Benedikt Maier, Nathaniel
Woodward
- Abstract summary: Self-Supervised Learning (SSL) is at the core of training modern large machine learning models.
We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation.
In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
- Score: 1.230412738960606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Supervised Learning (SSL) is at the core of training modern large
machine learning models, providing a scheme for learning powerful
representations that can be used in a variety of downstream tasks. However, SSL
strategies must be adapted to the type of training data and downstream tasks
required. We propose RS3L, a novel simulation-based SSL strategy that employs a
method of re-simulation to drive data augmentation for contrastive learning. By
intervening in the middle of the simulation process and re-running simulation
components downstream of the intervention, we generate multiple realizations of
an event, thus producing a set of augmentations covering all physics-driven
variations available in the simulator. Using experiments from high-energy
physics, we explore how this strategy may enable the development of a
foundation model; we show how R3SL pre-training enables powerful performance in
downstream tasks such as discrimination of a variety of objects and uncertainty
mitigation. In addition to our results, we make the RS3L dataset publicly
available for further studies on how to improve SSL strategies.
Related papers
- Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations [4.368891765870579]
We investigate the potential of incorporating active learning into deep neural networks (DNNs) surrogate training.
This allows intelligent and objective selection of training simulations, reducing the need to generate extensive simulation data.
The results set the groundwork for developing the high-performance computing infrastructure for Smart Surrogates.
arXiv Detail & Related papers (2024-07-10T14:00:20Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior [32.74388989649232]
We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
arXiv Detail & Related papers (2023-06-01T00:32:59Z) - Learning Controllable Adaptive Simulation for Multi-resolution Physics [86.8993558124143]
We introduce Learning controllable Adaptive simulation for Multi-resolution Physics (LAMP) as the first full deep learning-based surrogate model.
LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening.
We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models, and can adaptively trade-off computation to improve long-term prediction error.
arXiv Detail & Related papers (2023-05-01T23:20:27Z) - Continual learning autoencoder training for a particle-in-cell
simulation via streaming [52.77024349608834]
upcoming exascale era will provide a new generation of physics simulations with high resolution.
These simulations will have a high resolution, which will impact the training of machine learning models since storing a high amount of simulation data on disk is nearly impossible.
This work presents an approach that trains a neural network concurrently to a running simulation without data on a disk.
arXiv Detail & Related papers (2022-11-09T09:55:14Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Interpretable AI-based Large-scale 3D Pathloss Prediction Model for
enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss.
By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.