Federated Learning from Molecules to Processes: A Perspective
- URL: http://arxiv.org/abs/2506.18525v1
- Date: Mon, 23 Jun 2025 11:27:34 GMT
- Title: Federated Learning from Molecules to Processes: A Perspective
- Authors: Jan G. Rittig, Clemens Kortmann,
- Abstract summary: We envision collaborative efforts in machine learning (ML) developments within the chemical industry.<n>We discuss potential applications of federated learning in several fields of chemical engineering.<n>Our results indicate that ML models jointly trained with federated learning yield significantly higher accuracy than models trained by each chemical company individually.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a perspective on federated learning in chemical engineering that envisions collaborative efforts in machine learning (ML) developments within the chemical industry. Large amounts of chemical and process data are proprietary to chemical companies and are therefore locked in data silos, hindering the training of ML models on large data sets in chemical engineering. Recently, the concept of federated learning has gained increasing attention in ML research, enabling organizations to jointly train machine learning models without disclosure of their individual data. We discuss potential applications of federated learning in several fields of chemical engineering, from the molecular to the process scale. In addition, we apply federated learning in two exemplary case studies that simulate practical scenarios of multiple chemical companies holding proprietary data sets: (i) prediction of binary mixture activity coefficients with graph neural networks and (ii) system identification of a distillation column with autoencoders. Our results indicate that ML models jointly trained with federated learning yield significantly higher accuracy than models trained by each chemical company individually and can perform similarly to models trained on combined datasets from all companies. Federated learning has therefore great potential to advance ML models in chemical engineering while respecting corporate data privacy, making it promising for future industrial applications.
Related papers
- ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials [34.82692226532414]
We present an ensemble knowledge distillation (EKD) method to improve machine learning interatomic potentials (MLIPs)<n>First, multiple teacher models are trained to QC energies and then generate atomic forces for all configurations in the dataset. Next, the student MLIP is trained to both QC energies and to ensemble-averaged forces generated by the teacher models.<n>The resulting student MLIPs achieve new state-of-the-art accuracy on the COMP6 benchmark and show improved stability for molecular dynamics simulations.
arXiv Detail & Related papers (2025-03-18T14:32:51Z) - Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [60.93245342663455]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.<n>This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.<n>In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z) - ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [50.15254966969718]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications.<n>ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information.<n>We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z) - A Large Encoder-Decoder Family of Foundation Models For Chemical Language [1.1073864511426255]
This paper introduces a large encoder-decoder chemical foundation models pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem.
Our experiments across multiple benchmark datasets validate the capacity of the proposed model in providing state-of-the-art results for different tasks.
arXiv Detail & Related papers (2024-07-24T20:30:39Z) - Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties [0.0]
We introduce the elEmBERT model for chemical classification tasks.
It is based on deep learning techniques, such as a multilayer encoder architecture.
We demonstrate the opportunities offered by our approach on sets of organic, inorganic and crystalline compounds.
arXiv Detail & Related papers (2023-09-17T19:41:32Z) - Synthetic pre-training for neural-network interatomic potentials [0.0]
We show that synthetic atomistic data, themselves obtained at scale with an existing machine learning potential, constitute a useful pre-training task for neural-network interatomic potential models.
Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice.
arXiv Detail & Related papers (2023-07-24T17:16:24Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry.
One of the most promising applications is the construction of ML-based force fields (FFs)
This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.