Related papers: GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

URL: http://arxiv.org/abs/2405.16206v3
Date: Tue, 01 Oct 2024 05:14:15 GMT
Title: GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
Authors: Minghao Xu, Yunteng Geng, Yihang Zhang, Ling Yang, Jian Tang, Wentao Zhang,
Abstract summary: GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. By concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance.
Score: 35.818061926699336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Also, we evaluate how taxonomy prediction can boost other three function prediction tasks by MTL. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at https://github.com/GlycanML/GlycanML and maintain a leaderboard at https://GlycanML.github.io/project

Related papers

Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training [37.76325239977169]
We introduce the GlycanAA model for All-Atom-wise glycan modeling.<n>GlycanAA performs hierarchical message passing to capture from local atomic-level interactions to global monosaccharide-level interactions.<n>We design a multi-scale mask prediction algorithm to endow the model about different levels of dependencies in a glycan.
arXiv Detail & Related papers (2025-06-02T07:08:39Z)
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning [48.90556054777393]
Gated Linear Attention (GLA) architectures include competitive models such as Mamba and RWKV. We show that a multilayer GLA can implement a general class of Weighted Preconditioned Gradient Descent (WPGD) algorithms. Under mild conditions, we establish the existence and uniqueness (up to scaling) of a global minimum, corresponding to a unique WPGD solution.
arXiv Detail & Related papers (2025-04-06T00:37:36Z)
GlucoBench: Curated List of Continuous Glucose Monitoring Datasets with Prediction Benchmarks [0.12564343689544843]
Continuous glucose monitors (CGM) are small medical devices that measure blood glucose levels at regular intervals. Forecasting of glucose trajectories based on CGM data holds the potential to substantially improve diabetes management.
arXiv Detail & Related papers (2024-10-08T08:01:09Z)
Higher-Order Message Passing for Glycan Representation Learning [0.0]
Graph Networks (GNNs) are deep learning models designed to process and analyze graph-structured data. This work presents a new model architecture based on complexes and higher-order message passing to extract features from glycan structures into latent space representation. We envision that these improvements will spur further advances in computational glycosciences and reveal the roles of glycans in biology.
arXiv Detail & Related papers (2024-09-20T12:55:43Z)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z)
GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection [21.59275856238877]
GLEMOS is a benchmark for instantaneous graph learning (GL) model selection. It provides benchmark data for fundamental GL tasks, including link prediction and node classification. It is designed to be easily extended with new models, new graphs, and new performance records.
arXiv Detail & Related papers (2024-04-02T02:13:00Z)
uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks. Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting. We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z)
DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science [5.3825788156200565]
We present DGL-LifeSci, an open-source package for deep learning on graphs in life science. DGL-LifeSci is a python toolkit based on RDKit, PyTorch and Deep Graph Library. It allows GNN-based modeling on custom datasets for molecular property prediction, reaction prediction and molecule generation.
arXiv Detail & Related papers (2021-06-27T13:27:47Z)
Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z)
Structure-Enhanced Meta-Learning For Few-Shot Graph Classification [53.54066611743269]
This work explores the potential of metric-based meta-learning for solving few-shot graph classification. An implementation upon GIN, named SMFGIN, is tested on two datasets, Chembl and TRIANGLES.
arXiv Detail & Related papers (2021-03-05T09:03:03Z)
Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area. Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models. We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z)
Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes [144.6048446370369]
Graph convolutional neural networks(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification. We propose a GP regression model via GCNs(GPGC) for graph-based semi-supervised learning. We conduct extensive experiments to evaluate GPGC and demonstrate that it outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-02-26T10:02:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.