Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models
- URL: http://arxiv.org/abs/2509.21647v1
- Date: Thu, 25 Sep 2025 22:05:20 GMT
- Title: Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models
- Authors: Adam Lahouari, Jutta Rogal, Mark E. Tuckerman,
- Abstract summary: We introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation.<n>AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion.<n>It is validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of 1.7 meV/atom in energies and 7.0 meV/AA in forces are achieved.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/{\AA} in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-{\AA} accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.
Related papers
- Equivariant Evidential Deep Learning for Interatomic Potentials [55.6997213490859]
Uncertainty quantification is critical for assessing the reliability of machine learning interatomic potentials in molecular dynamics simulations.<n>Existing UQ approaches for MLIPs are often limited by high computational cost or suboptimal performance.<n>We propose textitEquivariant Evidential Deep Learning for Interatomic Potentials ($texte2$IP), a backbone-agnostic framework that models atomic forces and their uncertainty jointly.
arXiv Detail & Related papers (2026-02-11T02:00:25Z) - Machine learning surrogate models of many-body dispersion interactions in polymer melts [40.83978401377059]
We introduce a machine learning surrogate model specifically designed to predict MBD forces in polymer melts.<n>Our model is based on a trimmed SchNet architecture that selectively retains the most relevant atomic connections.<n>Characterized by high computational efficiency, our surrogate model enables practical incorporation of MBD effects into large-scale molecular simulations.
arXiv Detail & Related papers (2025-03-19T12:15:35Z) - Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials [34.82692226532414]
We present an ensemble knowledge distillation (EKD) method to improve machine learning interatomic potentials (MLIPs)<n>First, multiple teacher models are trained to QC energies and then generate atomic forces for all configurations in the dataset. Next, the student MLIP is trained to both QC energies and to ensemble-averaged forces generated by the teacher models.<n>The resulting student MLIPs achieve new state-of-the-art accuracy on the COMP6 benchmark and show improved stability for molecular dynamics simulations.
arXiv Detail & Related papers (2025-03-18T14:32:51Z) - MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science [62.96434290874878]
Current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks.<n>We develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM.<n>MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator.
arXiv Detail & Related papers (2025-01-18T13:54:00Z) - Materials Learning Algorithms (MALA): Scalable Machine Learning for Electronic Structure Calculations in Large-Scale Atomistic Simulations [2.04071520659173]
We present the Materials Learning Algorithms (MALA) package, a scalable machine learning framework suitable for large-scale atomistic simulations.<n>MALA models efficiently predict key electronic observables, including local density of states, electronic density, density of states, and total energy.<n>We demonstrate MALA's capabilities with examples including boron clusters, aluminum across its solid-liquid phase boundary, and predicting the electronic structure of a stacking fault in a large beryllium slab.
arXiv Detail & Related papers (2024-11-29T11:10:29Z) - Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy [9.81014501502049]
We develop a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data.
Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties.
arXiv Detail & Related papers (2024-05-09T19:51:27Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [53.81190146434045]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.<n>We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.<n>Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - Evaluating the Transferability of Machine-Learned Force Fields for
Material Property Modeling [2.494740426749958]
We present a more comprehensive set of benchmarking tests for evaluating the transferability of machine-learned force fields.
We use a graph neural network (GNN)-based force field coupled with the OpenMM package to carry out MD simulations for Argon.
Our results show that the model can accurately capture the behavior of the solid phase only when the configurations from the solid phase are included in the training dataset.
arXiv Detail & Related papers (2023-01-10T00:25:48Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Automated discovery of a robust interatomic potential for aluminum [4.6028828826414925]
Machine learning (ML) based potentials aim for faithful emulation of quantum mechanics (QM) calculations at drastically reduced computational cost.
We present a highly automated approach to dataset construction using the principles of active learning (AL)
We demonstrate this approach by building an ML potential for aluminum (ANI-Al)
To demonstrate transferability, we perform a 1.3M atom shock simulation, and show that ANI-Al predictions agree very well with DFT calculations on local atomic environments sampled from the nonequilibrium dynamics.
arXiv Detail & Related papers (2020-03-10T19:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.