Federated Learning of Molecular Properties in a Heterogeneous Setting
- URL: http://arxiv.org/abs/2109.07258v1
- Date: Wed, 15 Sep 2021 12:49:13 GMT
- Title: Federated Learning of Molecular Properties in a Heterogeneous Setting
- Authors: Wei Zhu, Andrew White, Jiebo Luo
- Abstract summary: We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
- Score: 79.00211946597845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chemistry research has both high material and computational costs to conduct
experiments. Institutions thus consider chemical data to be valuable and there
have been few efforts to construct large public datasets for machine learning.
Another challenge is that different intuitions are interested in different
classes of molecules, creating heterogeneous data that cannot be easily joined
by conventional distributed training. In this work, we introduce federated
heterogeneous molecular learning to address these challenges. Federated
learning allows end-users to build a global model collaboratively while
preserving the training data distributed over isolated clients. Due to the lack
of related research, we first simulate a federated heterogeneous benchmark
called FedChem. FedChem is constructed by jointly performing scaffold splitting
and Latent Dirichlet Allocation on existing datasets. Our results on FedChem
show that significant learning challenges arise when working with heterogeneous
molecules. We then propose a method to alleviate the problem, namely Federated
Learning by Instance reweighTing (FLIT). FLIT can align the local training
across heterogeneous clients by improving the performance for uncertain
samples. Comprehensive experiments conducted on our new benchmark FedChem
validate the advantages of this method over other federated learning schemes.
FedChem should enable a new type of collaboration for improving AI in chemistry
that mitigates concerns about valuable chemical data.
Related papers
- Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches.
We demonstrate that the more accurate energy data can improve the accuracy of structure prediction.
We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z) - MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - Federated Learning on Transcriptomic Data: Model Quality and Performance
Trade-Offs [0.0]
Machine learning on large-scale genomic or transcriptomic data is important for many novel health applications.
Due to privacy and regulatory reasons, it is also problematic to aggregate all data at a trusted third party.
Federated learning is a promising solution because it enables decentralized, collaborative machine learning without exchanging raw data.
arXiv Detail & Related papers (2024-02-22T13:21:26Z) - Enhanced sampling of robust molecular datasets with uncertainty-based
collective variables [0.0]
We propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points.
This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations.
arXiv Detail & Related papers (2024-02-06T06:42:51Z) - Factor-Assisted Federated Learning for Personalized Optimization with
Heterogeneous Data [6.024145412139383]
Federated learning is an emerging distributed machine learning framework aiming at protecting data privacy.
Data in different clients contain both common knowledge and personalized knowledge.
We develop a novel personalized federated learning framework for heterogeneous data, which we refer to as FedSplit.
arXiv Detail & Related papers (2023-12-07T13:05:47Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space [0.0]
We introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning.
We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data.
This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework.
arXiv Detail & Related papers (2023-01-06T14:22:43Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.