Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
- URL: http://arxiv.org/abs/2507.04981v3
- Date: Wed, 09 Jul 2025 07:33:59 GMT
- Title: Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
- Authors: Ruihao Zhang, Mao chen, Fei Ye, Dandan Meng, Yixuan Huang, Xiao Liu,
- Abstract summary: EAMil is a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA)<n>Our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA.
- Score: 12.912054929080133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.
Related papers
- An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z) - Enhancing Thyroid Cytology Diagnosis with RAG-Optimized LLMs and Pa-thology Foundation Models [0.0]
This study explores the application of AUC-enhanced large language models (LLMs) with pathology foundation models for thyroid diagnosis.<n>By leveraging a curated knowledge base, RAG facilitates dy-namic retrieval of relevant case studies, diagnostic criteria, and expert interpreta-tion.<n>The fusion of these AI-driven approaches en-hances diagnostic consistency, reduces variability, and supports pathologists in dis-tinguishing benign from malignant thyroid lesions.
arXiv Detail & Related papers (2025-05-13T14:01:35Z) - From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT [45.6537455491436]
Our approach consists of two processes: generating disease-centric associations and verifying these associations.<n>Using ChatGPT as the selected LLM, we designed prompt-engineering processes to establish linkages between diseases and related drugs, symptoms, and genes.
arXiv Detail & Related papers (2025-02-20T16:39:57Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity [4.120928123714289]
Anti-cancer immune response relies on bindings between T-cell receptors (TCRs) and antigens, which elicits adaptive immunity to eliminate tumor cells.<n>In this study, we introduce a lightweight masked language model, termed tcrLM, to address this challenge.<n>We construct the largest TCR CDR3 sequence set with more than 100 million distinct sequences, and pretrain tcrLM on these sequences.<n>The results demonstrate that tcrLM not only surpasses existing TCR-antigen binding prediction methods, but also outperforms other mainstream protein language models.
arXiv Detail & Related papers (2024-06-24T08:36:40Z) - TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data [42.96821770394798]
TACCO is a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data.
We conduct experiments on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction.
In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO.
arXiv Detail & Related papers (2024-06-14T14:18:38Z) - CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches.
We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z) - Neural Network-Based Histologic Remission Prediction In Ulcerative
Colitis [38.150634108667774]
Histologic remission is a new therapeutic target in ulcerative colitis (UC)
Endocytoscopy (EC) is a novel ultra-high magnification endoscopic technique.
We propose a neural network model that can assess histological disease activity in EC images.
arXiv Detail & Related papers (2023-08-28T15:54:14Z) - AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires [6.918664738267051]
We present an Adaptive Immune Repertoire-Invariant Variational Autoencoder (AIRIVA) that learns a low-dimensional, interpretable, and compositional representation of TCR repertoires to disentangle systematic effects in repertoires.
arXiv Detail & Related papers (2023-04-26T14:40:35Z) - Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial
Hemorrhage Etiology based on CT Scan [40.51754649947294]
The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between January 2011 and April 2018.
The model's diagnostic performance was compared with clinicians's performance.
The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation.
arXiv Detail & Related papers (2023-02-02T08:45:17Z) - Detecting Histologic & Clinical Glioblastoma Patterns of Prognostic
Relevance [6.281092892485014]
Glioblastoma is the most common and aggressive malignant adult tumor of the central nervous system.
Since adopting the current standard-of-care treatment 18 years ago, no substantial prognostic improvement has been noticed.
Here, we focus on identifying prognostically relevant characteristics from H&E stained WSI & clinical data relating to OS.
arXiv Detail & Related papers (2023-02-01T18:56:09Z) - Tensor-Based Multi-Modality Feature Selection and Regression for
Alzheimer's Disease Diagnosis [25.958167380664083]
We propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI)
We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities.
arXiv Detail & Related papers (2022-09-23T02:17:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.