Related papers: Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

URL: http://arxiv.org/abs/2406.02990v1
Date: Wed, 5 Jun 2024 06:42:27 GMT
Title: Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification
Authors: Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin,
Abstract summary: We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
Score: 119.13058298388101
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art.

Related papers

Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images [5.638556074980827]
Accurately predicting gene expression from histopathology images offers a scalable and non-invasive approach to molecular profiling.<n>Existing methods often underutilize the cross-modal representation alignment between histopathology images and gene expression profiles.<n>We propose Gene-DML, a unified framework that structures latent space through Dual-pathway Multi-Level discrimination.
arXiv Detail & Related papers (2025-07-19T15:45:12Z)
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset [3.716599571611912]
Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment.<n>We have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports.<n>This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status.
arXiv Detail & Related papers (2025-05-30T11:51:11Z)
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion [108.94237816552024]
We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
arXiv Detail & Related papers (2025-02-07T06:16:31Z)
Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed. We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels. At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z)
Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders [14.626706466908386]
Gene Regulatory Network Inference (GRNI) aims to identify causal relationships among genes using gene expression data. Gene expression is influenced by latent confounders, such as non-coding RNAs, which add complexity to GRNI. We propose GISL (Gene Regulatory Network Inference in the presence of Selection bias and Latent confounders) to infer true regulatory relationships in the presence of selection and confounding issues.
arXiv Detail & Related papers (2025-01-17T11:27:58Z)
Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production [49.814615043389864]
We propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs. We present the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms. Our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
arXiv Detail & Related papers (2024-10-24T06:54:27Z)
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z)
Predicting loss-of-function impact of genetic mutations: a machine learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores. These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation. Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z)
Genetic InfoMax: Exploring Mutual Information Maximization in High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS. We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z)
Epigenetics Algorithms: Self-Reinforcement-Attention mechanism to regulate chromosomes expression [0.0]
This paper proposes a new epigenetics algorithm that mimics the epigenetics phenomenon known as methylation. The novelty of our epigenetics algorithms lies primarily in taking advantage of attention mechanisms and deep learning, which fits well with the genes/silencing concept.
arXiv Detail & Related papers (2023-03-15T21:33:21Z)
Unsupervised ensemble-based phenotyping helps enhance the discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles. It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner. These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z)
rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest. Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z)
VEGN: Variant Effect Prediction with Graph Neural Networks [19.59965282985234]
We propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants. The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network. VeGN improves the performance of existing state-of-the-art models.
arXiv Detail & Related papers (2021-06-25T13:51:46Z)
SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene Expression [8.326669256957352]
We present SimpleChrome, a deep learning model that learns the histone modification representations of genes. The features learned from the model allow us to better understand the latent effects of cross-gene interactions and direct gene regulation on the target gene expression.
arXiv Detail & Related papers (2020-12-15T23:30:36Z)
Handling highly correlated genes in prediction analysis of genomic studies [0.0]
High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection. Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change.
arXiv Detail & Related papers (2020-07-05T22:14:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.