Accelerating MHC-II Epitope Discovery via Multi-Scale Prediction in Antigen Presentation
- URL: http://arxiv.org/abs/2512.14011v1
- Date: Tue, 16 Dec 2025 02:12:08 GMT
- Title: Accelerating MHC-II Epitope Discovery via Multi-Scale Prediction in Antigen Presentation
- Authors: Yue Wan, Jiayi Yuan, Zhiwei Feng, Xiaowei Jia,
- Abstract summary: Antigenic presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy.<n>The study of MHC-II antigenic poses significantly more challenges due to its complex binding specificity and ambiguous motif patterns.<n>We present a well-curated dataset derived from the Immune Epitope Database (IEDB) and other public sources. It not only extends and standardizes existing peptide-MHC-II datasets, but also introduces a novel antigen-MHC-II dataset with biological context.
- Score: 16.95876116653963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Antigenic epitope presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy. However, compared to the more widely studied MHC-I in computational immunotherapy, the study of MHC-II antigenic epitope poses significantly more challenges due to its complex binding specificity and ambiguous motif patterns. Consequently, existing datasets for MHC-II interactions are smaller and less standardized than those available for MHC-I. To address these challenges, we present a well-curated dataset derived from the Immune Epitope Database (IEDB) and other public sources. It not only extends and standardizes existing peptide-MHC-II datasets, but also introduces a novel antigen-MHC-II dataset with richer biological context. Leveraging this dataset, we formulate three major machine learning (ML) tasks of peptide binding, peptide presentation, and antigen presentation, which progressively capture the broader biological processes within the MHC-II antigen presentation pathway. We further employ a multi-scale evaluation framework to benchmark existing models, along with a comprehensive analysis over various modeling designs to this problem with a modular framework. Overall, this work serves as a valuable resource for advancing computational immunotherapy, providing a foundation for future research in ML guided epitope discovery and predictive modeling of immune responses.
Related papers
- R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer's Disease Progression [63.97617759805451]
Early detection of Alzheimer's disease requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic susceptibility.<n>We introduce R-GenIMA, an interpretable multimodal large language model that couples a novel ROI-wise vision transformer with genetic prompting.<n>R-GenIMA achieves state-of-the-art performance in four-way classification across normal cognition, subjective memory concerns, mild cognitive impairment, and AD.
arXiv Detail & Related papers (2025-12-22T02:54:10Z) - SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction [49.355973075150075]
We introduce SurvAgent, the first hierarchical chain-of-thought (CoT)-enhanced multi-agent system for multimodal survival prediction.<n>SurvAgent consists of two stages: WSI-Gene CoT-Enhanced Case Bank Construction employs hierarchical analysis through Low-Magnification Screening, Cross-Modal Similarity-Aware Patch Mining, and Confidence-Aware Patch Mining for pathology images.<n>Dichotomy-Based Multi-Expert Agent Inference retrieves similar cases via RAG and integrates multimodal reports with expert predictions through progressive interval refinement.
arXiv Detail & Related papers (2025-11-20T18:41:44Z) - Revealing Multimodal Causality with Large Language Models [80.95511545591107]
We propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data.<n>It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes.<n>Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD.
arXiv Detail & Related papers (2025-09-22T13:45:17Z) - A Machine Learning Framework for Pathway-Driven Therapeutic Target Discovery in Metabolic Disorders [1.41678086736482]
This study introduces a novel machine learning (ML) framework that integrates predictive modeling with gene-agnostic pathway mapping to identify high-risk individuals.<n>Using the Pima Indian dataset, logistic regression and t-tests were applied to identify key predictors of T2DM, yielding an overall model accuracy of 78.43%.
arXiv Detail & Related papers (2025-09-14T19:29:52Z) - Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations [0.39945675027960637]
We present LANTERN, a deep learning framework that combines large-scale protein language models with chemical representations of peptides.<n>Our model demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios.<n>These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.
arXiv Detail & Related papers (2025-04-22T20:22:34Z) - Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy [0.5315454965484603]
Cell and immunotherapy offer transformative potential for treating diseases like cancer and autoimmune disorders by modulating the immune system.<n>The development of these therapies is resource-intensive, with the majority of drug candidates failing to progress beyond laboratory testing.<n>Recent advances in machine learning have revolutionised areas such as protein engineering, applications in immunotherapy remain limited due to the scarcity of large-scale, standardised datasets and the complexity of cellular systems.
arXiv Detail & Related papers (2024-12-02T11:21:58Z) - Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer [12.164507399614347]
We developed a Universal IHC (UIHC) analyzer, an AI model for interpreting IHC images regardless of tumor or IHC types.
This multi-cohort trained model outperforms conventional single-cohort models in interpreting unseen IHCs.
arXiv Detail & Related papers (2024-07-30T08:39:30Z) - An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z) - Quantifying & Modeling Multimodal Interactions: An Information
Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task.
To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks.
We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - A robust kernel machine regression towards biomarker selection in
multi-omics datasets of osteoporosis for drug discovery [2.2897244874280043]
We propose "robust kernel machine regression (RobMR)," to improve the robustness of statistical machine regression and the diversity of fictional data.
Experiments demonstrate that the proposed approach effectively identifies the inter-related risk factors of osteoporosis.
The proposed approach can be applied be to any disease model multi-omics datasets are available.
arXiv Detail & Related papers (2022-01-13T16:39:46Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.