Related papers: LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models

LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models

URL: http://arxiv.org/abs/2505.22208v1
Date: Wed, 28 May 2025 10:36:49 GMT
Title: LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models
Authors: Yosuke Oyama, Yusuke Majima, Eiji Ohta, Yasufumi Sakai,
Abstract summary: We propose LaMM, a semi-supervised pre-training method incorporating improved denoising self-supervised learning and a load-balancing algorithm for efficient multi-node training.<n>We demonstrate that our approach effectively leverages a large-scale dataset of $sim$300 million semi-labeled samples to train a single NNP model, resulting in improved fine-tuning performance in terms of both speed and accuracy.
Score: 0.1999925939110439
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural network potentials (NNPs) are crucial for accelerating computational materials science by surrogating density functional theory (DFT) calculations. Improving their accuracy is possible through pre-training and fine-tuning, where an NNP model is first pre-trained on a large-scale dataset and then fine-tuned on a smaller target dataset. However, this approach is computationally expensive, mainly due to the cost of DFT-based dataset labeling and load imbalances during large-scale pre-training. To address this, we propose LaMM, a semi-supervised pre-training method incorporating improved denoising self-supervised learning and a load-balancing algorithm for efficient multi-node training. We demonstrate that our approach effectively leverages a large-scale dataset of $\sim$300 million semi-labeled samples to train a single NNP model, resulting in improved fine-tuning performance in terms of both speed and accuracy.

Related papers

Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets.<n>High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z)
Pre-training Graph Neural Networks with Structural Fingerprints for Materials Discovery [1.187456026346823]
We propose a novel pre-training objective which uses cheaply-computed structural fingerprints as targets.<n>Our experiments show this approach can act as a general strategy for pre-training GNNs with application towards large scale foundational models for atomistic data.
arXiv Detail & Related papers (2025-03-03T06:50:23Z)
Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z)
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws [51.608402959163925]
We present the first systematic exploration of optimal sparse pre-training configurations for large language models.<n>We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss.<n>We propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training.
arXiv Detail & Related papers (2025-01-21T20:23:22Z)
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.<n>As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z)
AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs [61.13296177652599]
We show that data mixtures that perform well at smaller scales may not retain their advantage at larger scales.<n>We propose AutoScale, a two-stage, scale-aware data composition framework.
arXiv Detail & Related papers (2024-07-29T17:06:30Z)
Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design [15.47240906902083]
This paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to support both the regular dense operations and the computation-efficient N:M sparse operations.
arXiv Detail & Related papers (2023-09-22T17:26:19Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models [4.114555639014612]
We show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training. We demonstrate that we can induce up to 75% sparsity into a 1.3B parameter GPT-3 XL model resulting in a 2.5x reduction in pre-training FLOPs.
arXiv Detail & Related papers (2023-03-18T17:56:01Z)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks. Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z)
Parameter estimation for WMTI-Watson model of white matter using encoder-decoder recurrent neural network [0.0]
In this study, we evaluate the performance of NLLS, the RNN-based method and a multilayer perceptron (MLP) on datasets rat and human brain. We showed that the proposed RNN-based fitting approach had the advantage of highly reduced computation time over NLLS.
arXiv Detail & Related papers (2022-03-01T16:33:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.