Related papers: Federated Learning of Molecular Properties in a Heterogeneous Setting

Federated Learning of Molecular Properties in a Heterogeneous Setting

URL: http://arxiv.org/abs/2109.07258v1
Date: Wed, 15 Sep 2021 12:49:13 GMT
Title: Federated Learning of Molecular Properties in a Heterogeneous Setting
Authors: Wei Zhu, Andrew White, Jiebo Luo
Abstract summary: We introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
Score: 79.00211946597845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. Due to the lack of related research, we first simulate a federated heterogeneous benchmark called FedChem. FedChem is constructed by jointly performing scaffold splitting and Latent Dirichlet Allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT). FLIT can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

Related papers

Federated Learning from Molecules to Processes: A Perspective [0.0]
We envision collaborative efforts in machine learning (ML) developments within the chemical industry.<n>We discuss potential applications of federated learning in several fields of chemical engineering.<n>Our results indicate that ML models jointly trained with federated learning yield significantly higher accuracy than models trained by each chemical company individually.
arXiv Detail & Related papers (2025-06-23T11:27:34Z)
Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials [34.82692226532414]
Machine learning interatomic potentials (MLIPs) are a promising tool to accelerate atomistic simulations and molecular property prediction. The quality of MLIPs depends on the quantity of available training data as well as the quantum chemistry (QC) level of theory used to generate that data. We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets.
arXiv Detail & Related papers (2025-03-18T14:32:51Z)
Federated Learning in Chemical Engineering: A Tutorial on a Framework for Privacy-Preserving Collaboration Across Distributed Data Sources [0.0]
This work aims to provide the chemical engineering community with an accessible introduction to the discipline. It explores the application of Federated Learning in tasks such as manufacturing optimization, multimodal data integration, and drug discovery. The tutorial was built using key frameworks such as $textttFlower$ and $texttTensorFlow Federated$ and was designed to provide chemical engineers with the right tools to adopt FL.
arXiv Detail & Related papers (2024-11-23T13:16:06Z)
Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches. We demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z)
MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules. Our dataset offers significant physicochemical interpretability to guide model development and design. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z)
Federated Learning on Transcriptomic Data: Model Quality and Performance Trade-Offs [0.0]
Machine learning on large-scale genomic or transcriptomic data is important for many novel health applications. Due to privacy and regulatory reasons, it is also problematic to aggregate all data at a trusted third party. Federated learning is a promising solution because it enables decentralized, collaborative machine learning without exchanging raw data.
arXiv Detail & Related papers (2024-02-22T13:21:26Z)
Enhanced sampling of robust molecular datasets with uncertainty-based collective variables [0.0]
We propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points. This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations.
arXiv Detail & Related papers (2024-02-06T06:42:51Z)
Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data [6.024145412139383]
Federated learning is an emerging distributed machine learning framework aiming at protecting data privacy. Data in different clients contain both common knowledge and personalized knowledge. We develop a novel personalized federated learning framework for heterogeneous data, which we refer to as FedSplit.
arXiv Detail & Related papers (2023-12-07T13:05:47Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
ChemVise: Maximizing Out-of-Distribution Chemical Detection with the Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones. We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.