PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset
- URL: http://arxiv.org/abs/2506.00096v1
- Date: Fri, 30 May 2025 11:51:11 GMT
- Title: PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset
- Authors: Liangrui Pan, Qingchun Liang, Shen Zhao, Songqing Fan, Shaoliang Peng,
- Abstract summary: Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment.<n>We have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports.<n>This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status.
- Score: 3.716599571611912
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision therapy. Although some prior studies have shown that deep learning can accelerate the prediction of key gene mutations from lung cancer pathology slides, their performance remains suboptimal and has so far been limited mainly to early screening tasks. To address these limitations, we have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports from 1,576 patients at the Second Xiangya Hospital, Central South University, and 448 TCGA-LUAD patients. This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status, with the goal of leveraging pathology images to predict mutations, subtypes, exon locations, and TMB for early genetic screening and to advance precision oncology. Unlike existing datasets, we provide molecular-level information related to histopathology images in PathGene to facilitate the development of biomarker prediction models. We benchmarked 11 multiple-instance learning methods on PathGene for mutation, subtype, exon, and TMB prediction tasks. These experimental methods provide valuable alternatives for early genetic screening of lung cancer patients and assisting clinicians to quickly develop personalized precision targeted treatment plans for patients. Code and data are available at https://github.com/panliangrui/NIPS2025/.
Related papers
- EnTao-GPM: DNA Foundation Model for Predicting the Germline Pathogenic Mutations [16.32431932781823]
Cross-species targeted pre-training on disease-relevant mammalian genomes (human, pig, mouse)<n> Germline mutation specialization via fine-tuning on ClinVar and HGMD.<n>Interpretable clinical framework integrating DNA sequence embeddings with LLM-based statistical explanations.
arXiv Detail & Related papers (2025-07-29T11:34:41Z) - Spatial Transcriptomics Expression Prediction from Histopathology Based on Cross-Modal Mask Reconstruction and Contrastive Learning [6.977427565352716]
We develop a contrastive learning-based deep learning method to predict spatially resolved gene expression from whole-slide images.<n>Our method preserves gene-gene correlations and applies to datasets with limited samples.
arXiv Detail & Related papers (2025-06-10T14:42:03Z) - GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z) - SurGen: 1020 H&E-stained Whole Slide Images With Survival and Genetic Markers [0.0]
We present SurGen, a dataset comprising 1,020 H&E-stained whole slide images (WSIs) from 843 colorectal cancer cases.<n>The dataset includes detailed annotations for key genetic mutations (KRAS, NRAS, BRAF) and mismatch repair status, as well as survival data for 426 cases.
arXiv Detail & Related papers (2025-02-07T14:12:07Z) - Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.<n>We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.<n>At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted
Imaging Data via Anatomic-Conditional Controlled Latent Diffusion [68.45407109385306]
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.
There has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data.
In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy.
arXiv Detail & Related papers (2023-11-30T15:11:03Z) - Breast Cancer Histopathology Image based Gene Expression Prediction
using Spatial Transcriptomics data and Deep Learning [3.583756449759971]
We present BrST-Net, a deep learning framework for predicting gene expression from histopathology images.
We trained and evaluated 10 state-of-the-art deep learning models without utilizing pretrained weights for the prediction of 250 genes.
Our methodology outperforms previous studies, with 237 genes identified with positive correlation, including 24 genes with a median correlation coefficient greater than 0.50.
arXiv Detail & Related papers (2023-03-17T14:03:40Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Optimize Deep Learning Models for Prediction of Gene Mutations Using
Unsupervised Clustering [6.494144125433731]
Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images.
In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types.
We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types.
arXiv Detail & Related papers (2022-03-31T11:48:21Z) - DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes [5.179504118679301]
Cancer and its subtypes constitute approximately 30% of all causes of death globally.
DeepGene Transformer is proposed which addresses the complexity of high-dimensional gene expression with a multi-head self-attention module.
arXiv Detail & Related papers (2021-08-26T15:02:55Z) - Transcriptome-wide prediction of prostate cancer gene expression from
histopathology images using co-expression based convolutional neural networks [0.8874479658912061]
We propose a new, computationally efficient approach for disease specific modelling of relationships between morphology and gene expression.
We conducted the first transcriptome-wide analysis in prostate cancer, using CNNs to predict bulk RNA-sequencing estimates.
arXiv Detail & Related papers (2021-04-19T13:50:25Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.