ATTNSOM: Learning Cross-Isoform Attention for Cytochrome P450 Site-of-Metabolism
- URL: http://arxiv.org/abs/2601.20891v1
- Date: Wed, 28 Jan 2026 05:59:30 GMT
- Title: ATTNSOM: Learning Cross-Isoform Attention for Cytochrome P450 Site-of-Metabolism
- Authors: Hajung Kim, Eunha Lee, Sohyun Chung, Jueon Park, Seungheun Baek, Jaewoo Kang,
- Abstract summary: We propose ATTN, an atom-level site-of-metabolism prediction framework.<n>It integrates intrinsic molecular reactivity with cross-isoform relationships.<n>The model is evaluated on two benchmark datasets with site-of-metabolism labels at atom resolution.
- Score: 14.60742753122634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying metabolic sites where cytochrome P450 enzymes metabolize small-molecule drugs is essential for drug discovery. Although existing computational approaches have been proposed for site-of-metabolism prediction, they typically ignore cytochrome P450 isoform identity or model isoforms independently, thereby failing to fully capture inherent cross-isoform metabolic patterns. In addition, prior evaluations often rely on top-k metrics, where false positive atoms may be included among the top predictions, underscoring the need for complementary metrics that more directly assess binary atom-level discrimination under severe class imbalance. We propose ATTNSOM, an atom-level site-of-metabolism prediction framework that integrates intrinsic molecular reactivity with cross-isoform relationships. The model combines a shared graph encoder, molecule-conditioned atom representations, and a cross-attention mechanism to capture correlated metabolic patterns across cytochrome P450 isoforms. The model is evaluated on two benchmark datasets annotated with site-of-metabolism labels at atom resolution. Across these benchmarks, the model achieves consistently strong top-k performance across multiple cytochrome P450 isoforms. Relative to ablated variants, the model yields higher Matthews correlation coefficient, indicating improved discrimination of true metabolic sites. These results support the importance of explicitly modeling cross-isoform relationships for site-of-metabolism prediction. The code and datasets are available at https://github.com/dmis-lab/ATTNSOM.
Related papers
- Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction [0.22369578015657954]
Trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately.<n>We suggest a regression framework based on knowledge distillation that uses protein structural data during training and only needs sequence data during inference.
arXiv Detail & Related papers (2026-01-07T08:43:08Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - Boltzmann Graph Ensemble Embeddings for Aptamer Libraries [37.52407391187203]
Machine-learning methods in biochemistry commonly represent molecules as graphs of pairwise intermolecular interactions for property and structure predictions.<n>We introduce a thermodynamically parameterized exponential-family random graph (ERGM) embedding that models molecules as Boltzmann-weighted ensembles of interaction graphs.<n>We show that the proposed embedding enables robust community detection and subgraph-level explanations for aptamer affinity, even in the presence of biased observations.
arXiv Detail & Related papers (2025-10-24T19:13:36Z) - Composable Score-based Graph Diffusion Model for Multi-Conditional Molecular Generation [85.58520120011269]
We propose Composable Score-based Graph Diffusion model (CSGD), which extends score matching to discrete graphs via concrete scores.<n>We show that CSGD achieves state-of-the-art performance with a 15.3% average improvement in controllability over prior methods.<n>Our findings highlight the practical advantages of score-based modeling for discrete graph generation and its capacity for flexible, multi-property molecular design.
arXiv Detail & Related papers (2025-09-11T13:37:56Z) - Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production [49.814615043389864]
We propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs.
We present the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms.
Our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
arXiv Detail & Related papers (2024-10-24T06:54:27Z) - Multi-View Variational Autoencoder for Missing Value Imputation in
Untargeted Metabolomics [17.563099908890013]
We propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites.
By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values.
arXiv Detail & Related papers (2023-10-12T02:34:56Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Predicting pathways for old and new metabolites through clustering [0.06091702876917279]
We present an approach to identify pathways based on metabolite structure.
After applying clustering algorithms to both groups of features, we found the clusters accurately linked 92% of known metabolites to their respective pathways.
arXiv Detail & Related papers (2022-11-28T19:07:02Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.