Enhancing Molecular Property Prediction via Mixture of Collaborative
Experts
- URL: http://arxiv.org/abs/2312.03292v1
- Date: Wed, 6 Dec 2023 05:02:10 GMT
- Title: Enhancing Molecular Property Prediction via Mixture of Collaborative
Experts
- Authors: Xu Yao, Shuang Liang, Songqiao Han and Hailiang Huang
- Abstract summary: We introduce the GNN-MoCE architecture to address data scarcity and imbalance in MPP.
It employs the Mixture of Collaborative Experts (MoCE) as predictors, exploiting task commonalities.
Our model demonstrates superior performance over traditional methods on 24 MPP datasets.
- Score: 23.388085838279405
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Molecular Property Prediction (MPP) task involves predicting biochemical
properties based on molecular features, such as molecular graph structures,
contributing to the discovery of lead compounds in drug development. To address
data scarcity and imbalance in MPP, some studies have adopted Graph Neural
Networks (GNN) as an encoder to extract commonalities from molecular graphs.
However, these approaches often use a separate predictor for each task,
neglecting the shared characteristics among predictors corresponding to
different tasks. In response to this limitation, we introduce the GNN-MoCE
architecture. It employs the Mixture of Collaborative Experts (MoCE) as
predictors, exploiting task commonalities while confronting the homogeneity
issue in the expert pool and the decision dominance dilemma within the expert
group. To enhance expert diversity for collaboration among all experts, the
Expert-Specific Projection method is proposed to assign a unique projection
perspective to each expert. To balance decision-making influence for
collaboration within the expert group, the Expert-Specific Loss is presented to
integrate individual expert loss into the weighted decision loss of the group
for more equitable training. Benefiting from the enhancements of MoCE in expert
creation, dynamic expert group formation, and experts' collaboration, our model
demonstrates superior performance over traditional methods on 24 MPP datasets,
especially in tasks with limited data or high imbalance.
Related papers
- On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists [33.68104398807581]
We propose a novel $textbfCo$llaborative learning approach with a $textbfMi$xture of $textbfG$eneralists and $textbfS$pecialists (CoMiGS)
Our approach distinguishes generalists and specialists by aggregating certain experts across end users while keeping others localized to specialize in user-specific datasets.
arXiv Detail & Related papers (2024-09-20T22:34:37Z) - HMoE: Heterogeneous Mixture of Experts for Language Modeling [45.65121689677227]
Traditionally, Mixture of Experts (MoE) models use homogeneous experts, each with identical capacity.
We propose a novel Heterogeneous Mixture of Experts (HMoE) where experts differ in size and thus possess diverse capacities.
HMoE achieves lower loss with fewer activated parameters and outperforms conventional homogeneous MoE models on various pre-training evaluation benchmarks.
arXiv Detail & Related papers (2024-08-20T09:35:24Z) - Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts [25.504602853436047]
Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing.
We propose HyperMoE, a novel MoE framework built upon Hypernetworks.
This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning.
arXiv Detail & Related papers (2024-02-20T02:09:55Z) - Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer [59.43462055143123]
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning.
In this study, we shed light on the homogeneous representation problem, wherein experts in the MoE fail to specialize and lack diversity.
We propose an alternating training strategy that encourages each expert to update in a direction to the subspace spanned by other experts.
arXiv Detail & Related papers (2023-10-15T07:20:28Z) - Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy [84.11508381847929]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks.
We propose M-SMoE, which leverages routing statistics to guide expert merging.
Our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance.
arXiv Detail & Related papers (2023-10-02T16:51:32Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Gaussian Experts Selection using Graphical Models [7.530615321587948]
Local approximations reduce time complexity by dividing the original dataset into subsets and training a local expert on each subset.
We leverage techniques from the literature on undirected graphical models, using sparse precision matrices that encode conditional dependencies between experts to select the most important experts.
arXiv Detail & Related papers (2021-02-02T14:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.