Identifying actionable driver mutations in lung cancer using an efficient Asymmetric Transformer Decoder
- URL: http://arxiv.org/abs/2508.02431v2
- Date: Tue, 05 Aug 2025 09:21:24 GMT
- Title: Identifying actionable driver mutations in lung cancer using an efficient Asymmetric Transformer Decoder
- Authors: Biagio Brattoli, Jack Shi, Jongchan Park, Taebum Lee, Donggeun Yoo, Sergio Pereira,
- Abstract summary: This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations.<n>We introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality.<n>Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF.
- Score: 9.503365381306963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying actionable driver mutations in non-small cell lung cancer (NSCLC) can impact treatment decisions and significantly improve patient outcomes. Despite guideline recommendations, broader adoption of genetic testing remains challenging due to limited availability and lengthy turnaround times. Machine Learning (ML) methods for Computational Pathology (CPath) offer a potential solution; however, research often focuses on only one or two common mutations, limiting the clinical value of these tools and the pool of patients who can benefit from them. This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations: ALK, BRAF, EGFR, ERBB2, KRAS, and MET ex14. Additionally, we introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality. This approach efficiently extracts information from patch embeddings and minimizes overfitting risks, proving highly adaptable to the MIL setting. Moreover, we present a method to directly utilize tissue type in the model, addressing a typical MIL limitation where either all regions or only some specific regions are analyzed, neglecting biological relevance. Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF, moving ML-based tests closer to being practical alternatives to standard genetic testing.
Related papers
- PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset [3.716599571611912]
Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment.<n>We have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports.<n>This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status.
arXiv Detail & Related papers (2025-05-30T11:51:11Z) - Improving statistical learning methods via features selection without replacement sampling and random projection [0.680740878601496]
Cancer is a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression.<n>High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem.<n>This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
arXiv Detail & Related papers (2025-05-28T22:36:46Z) - MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts [54.915060471994686]
We propose MAST-Pro, a novel framework that integrates dynamic Mixture-of-Experts (D-MoE) and knowledge-driven prompts for pan-tumor segmentation.<n>Specifically, text and anatomical prompts provide domain-specific priors guiding tumor representation learning, while D-MoE dynamically selects experts to balance generic and tumor-specific feature learning.<n>Experiments on multi-anatomical tumor datasets demonstrate that MAST-Pro outperforms state-of-the-art approaches, achieving up to a 5.20% improvement in average improvement while reducing trainable parameters by 91.04%, without compromising accuracy.
arXiv Detail & Related papers (2025-03-18T15:39:44Z) - Optimal transport for automatic alignment of untargeted metabolomic data [8.692678207022084]
We introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport.
By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness.
We show how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.
arXiv Detail & Related papers (2023-06-05T20:08:19Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - A robust and lightweight deep attention multiple instance learning
algorithm for predicting genetic alterations [4.674211520843232]
We propose a novel Attention-based Multiple Instance Mutation Learning (AMIML) model for predicting gene mutations.
AMIML was comprised of successive 1-D convolutional layers, a decoder, and a residual weight connection to facilitate further integration of a lightweight attention mechanism.
AMIML demonstrated excellent robustness, not only outperforming all the five baseline algorithms in the vast majority of the tested genes, but also providing near-best-performance for the other seven genes.
arXiv Detail & Related papers (2022-05-31T15:45:29Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - DRIVE: Machine Learning to Identify Drivers of Cancer with
High-Dimensional Genomic Data & Imputed Labels [0.0]
We propose a novel combination method for driver mutation identification.
It uses the power of both statistical modelling and functional-impact based methods.
Initial results show this approach outperforms the state-of-the-art methods in terms of precision.
arXiv Detail & Related papers (2021-05-02T13:27:31Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.