An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
- URL: http://arxiv.org/abs/2505.15868v1
- Date: Wed, 21 May 2025 12:03:37 GMT
- Title: An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
- Authors: Changchun Yang, Weiqian Dai, Yilan Zhang, Siyuan Chen, Jingdong Hu, Junkai Su, Yuxuan Chen, Ao Xu, Na Li, Xin Gao, Yongguo Yu,
- Abstract summary: CHROMA is a foundation model for cytogenomics designed to overcome challenges by learning generalizable representations of chromosomal abnormalities.<n>It is pre-trained on over 84,000 specimens (4 million chromosomal images) via self-supervised learning.<n>CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis.
- Score: 18.252994255843813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible.
Related papers
- Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia [14.27396467108753]
Cancer cachexia is a multifactorial syndrome characterized by progressive muscle wasting, metabolic dysfunction, and systemic inflammation.<n>There is no single definitive biomarker for cachexia.<n>This study proposes a multimodal AI-based biomarker for early cancer cachexia detection.
arXiv Detail & Related papers (2025-03-09T22:32:37Z) - GenIAS: Generator for Instantiating Anomalies in time Series [54.959865643340535]
We develop a generative model for time series anomaly detection (TSAD) using a variational autoencoder.<n>GenIAS is designed to produce diverse and realistic synthetic anomalies for TSAD tasks.<n>Our experiments demonstrate that GenIAS consistently outperforms seventeen traditional and deep anomaly detection models.
arXiv Detail & Related papers (2025-02-12T10:10:04Z) - Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data [1.6163129903911508]
Cancer evolves continuously over time through a complex interplay of genetic, epigenetic, microenvironmental, and phenotypic changes.<n>Today's data-driven research in oncology has primarily focused on cross-sectional analysis using data from a single modality.<n>Advances in multiscale data collection and computational methods now enable the discovery of longitudinal multimodal biomarkers for precision oncology.
arXiv Detail & Related papers (2025-02-11T01:44:51Z) - Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.<n>We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.<n>At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z) - Integrating Large Language Models for Genetic Variant Classification [12.244115429231888]
Large Language Models (LLMs) have emerged as transformative tools in genetics.
This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense.
Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets.
arXiv Detail & Related papers (2024-11-07T13:45:56Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions [3.5489676012585236]
We introduce the Bridge model to derive integrated features to preserve information beyond common genes.
The model consistently excels in predicting patient survival across six cancer types in GENIE BPC data.
arXiv Detail & Related papers (2024-01-30T23:25:05Z) - Integrate Any Omics: Towards genome-wide data integration for patient
stratification [6.893309898200498]
IntegrAO is an unsupervised framework for integrating incomplete multi-omics data and classifying new samples.
IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology.
arXiv Detail & Related papers (2024-01-15T19:57:07Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.