Related papers: Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

URL: http://arxiv.org/abs/2403.07066v1
Date: Mon, 11 Mar 2024 18:00:47 GMT
Title: Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Authors: Philip Harris, Michael Kagan, Jeffrey Krupa, Benedikt Maier, Nathaniel Woodward
Abstract summary: Self-Supervised Learning (SSL) is at the core of training modern large machine learning models. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
Score: 1.230412738960606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how R3SL pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.

Related papers

Estimating the Effects of Sample Training Orders for Large Language Models without Retraining [49.59675538160363]
The order of training samples plays a crucial role in large language models (LLMs)<n>Traditional methods for investigating this effect generally require retraining the model with various sample orders.<n>We improve traditional methods by designing a retraining-free framework.
arXiv Detail & Related papers (2025-05-28T07:07:02Z)
NeuralCFD: Deep Learning on High-Fidelity Automotive Aerodynamics Simulations [11.849142587216903]
Key challenges must be overcome before neural network-based simulation surrogates can be implemented at an industry scale. We introduce Geometry-preserving Universal Physics Transformer (GP-UPT), which separates geometry encoding and physics predictions. GP-UPT circumvents the creation of high-quality simulation meshes, enables accurate 3D velocity field predictions at 20 million mesh cells, and excels in transfer learning from low-fidelity to high-fidelity simulation datasets.
arXiv Detail & Related papers (2025-02-13T17:58:07Z)
Transfer learning in Scalable Graph Neural Network for Improved Physical Simulation [37.1565271299621]
We introduce a pre-training and transfer learning paradigm for graph network simulators. We show that our proposed transfer learning methods allow the model to perform even better when fine-tuned with small amounts of training data.
arXiv Detail & Related papers (2025-02-07T08:18:23Z)
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments [1.4999444543328293]
Simulating learner actions helps stress-test open-ended interactive learning environments and prototype new adaptations before deployment. We propose Hyp-Mix, a simulation authoring framework that allows experts to develop and evaluate simulations by combining testable hypotheses about learner behavior.
arXiv Detail & Related papers (2024-10-03T00:25:40Z)
Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations [4.368891765870579]
We investigate the potential of incorporating active learning into deep neural networks (DNNs) surrogate training. This allows intelligent and objective selection of training simulations, reducing the need to generate extensive simulation data. The results set the groundwork for developing the high-performance computing infrastructure for Smart Surrogates.
arXiv Detail & Related papers (2024-07-10T14:00:20Z)
Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training. We propose three effective strategies to enhance LLM performance within a fixed compute budget. Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z)
Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data. We use low-fidelity physical priors to regularize the training of neural network models. We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z)
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior [32.74388989649232]
We study how pre-training could be used for scientific machine learning (SciML) applications. We find that fine-tuning these models yields more performance gains as model size increases.
arXiv Detail & Related papers (2023-06-01T00:32:59Z)
Learning Controllable Adaptive Simulation for Multi-resolution Physics [86.8993558124143]
We introduce Learning controllable Adaptive simulation for Multi-resolution Physics (LAMP) as the first full deep learning-based surrogate model. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models, and can adaptively trade-off computation to improve long-term prediction error.
arXiv Detail & Related papers (2023-05-01T23:20:27Z)
Decision-Focused Model-based Reinforcement Learning for Reward Transfer [27.899494428456048]
We propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function. We provide theoretical and empirical evidence, on a variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.
arXiv Detail & Related papers (2023-04-06T20:47:09Z)
Continual learning autoencoder training for a particle-in-cell simulation via streaming [52.77024349608834]
upcoming exascale era will provide a new generation of physics simulations with high resolution. These simulations will have a high resolution, which will impact the training of machine learning models since storing a high amount of simulation data on disk is nearly impossible. This work presents an approach that trains a neural network concurrently to a running simulation without data on a disk.
arXiv Detail & Related papers (2022-11-09T09:55:14Z)
SAM-RL: Sensing-Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering [49.78647219715034]
We propose a sensing-aware model-based reinforcement learning system called SAM-RL. With the sensing-aware learning pipeline, SAM-RL allows a robot to select an informative viewpoint to monitor the task process. We apply our framework to real world experiments for accomplishing three manipulation tasks: robotic assembly, tool manipulation, and deformable object manipulation.
arXiv Detail & Related papers (2022-10-27T05:30:43Z)
RLFlow: Optimising Neural Network Subgraph Transformation with World Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime. We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z)
Interpretable AI-based Large-scale 3D Pathloss Prediction Model for enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss. By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z)
Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.