Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction
- URL: http://arxiv.org/abs/2410.00945v1
- Date: Tue, 1 Oct 2024 16:00:22 GMT
- Title: Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction
- Authors: Fredrik K. Gustafsson, Mattias Rantalainen,
- Abstract summary: Prediction of mRNA gene-expression profiles directly from routine whole-slide images could potentially offer cost-effective and widely accessible molecular phenotyping.
This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction.
- Score: 3.2995359570845912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction of mRNA gene-expression profiles directly from routine whole-slide images (WSIs) using deep learning models could potentially offer cost-effective and widely accessible molecular phenotyping. While such WSI-based gene-expression prediction models have recently emerged within computational pathology, the high-dimensional nature of the corresponding regression problem offers numerous design choices which remain to be analyzed in detail. This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction. For example, we conclude that training a single model to simultaneously regress all 20530 genes is a computationally efficient yet very strong baseline.
Related papers
- Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations [44.619690829431214]
We train a neural network to predict distributional responses in gene expression following genetic perturbations.<n>Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics.
arXiv Detail & Related papers (2025-07-01T06:04:28Z) - Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra [44.253701408005895]
We leverage modern fast linear algebra techniques to develop DeepWAS, a method to train large neural network predictive models to optimize likelihood.<n>We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.
arXiv Detail & Related papers (2025-06-24T13:07:45Z) - A Simplified Analysis of SGD for Linear Regression with Weight Averaging [64.2393952273612]
Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
arXiv Detail & Related papers (2025-06-18T15:10:38Z) - Neuro-Evolutionary Approach to Physics-Aware Symbolic Regression [0.0]
We propose a neuro-evolutionary symbolic regression method that combines evolutionary-based search for optimal neural network topologies with gradient-based tuning of the network's parameters.
Our method employs a memory-based strategy and population perturbations to enhance exploitation and reduce the risk of being trapped in suboptimal NNs.
arXiv Detail & Related papers (2025-04-23T08:29:53Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.
We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - Asymptotic Optimism of Random-Design Linear and Kernel Regression Models [7.427792573157871]
We derived the closed-form optimism of linear regression models under random designs.
We studied the fundamental different behaviors of linear regression model, tangent kernel (NTK) regression model and three-layer fully connected neural networks.
arXiv Detail & Related papers (2025-02-18T16:19:28Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D
Shifted Window Transformer [4.059849656394191]
Genomic Interpreter is a novel architecture for genomic assay prediction.
Model can identify hierarchical dependencies in genomic sites.
Evaluated on a dataset containing 38,171 DNA segments of 17K pairs.
arXiv Detail & Related papers (2023-06-08T12:10:13Z) - ResMem: Learn what you can and memorize the rest [79.19649788662511]
We propose the residual-memorization (ResMem) algorithm to augment an existing prediction model.
By construction, ResMem can explicitly memorize the training labels.
We show that ResMem consistently improves the test set generalization of the original prediction model.
arXiv Detail & Related papers (2023-02-03T07:12:55Z) - Neural network facilitated ab initio derivation of linear formula: A
case study on formulating the relationship between DNA motifs and gene
expression [8.794181445664243]
We propose a framework for ab initio derivation of sequence motifs and linear formula using a new approach based on the interpretable neural network model.
We showed that this linear model could predict gene expression levels using promoter sequences with a performance comparable to deep neural network models.
arXiv Detail & Related papers (2022-08-19T22:29:30Z) - Topological structure of complex predictions [15.207535648404765]
Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data.
We use topological data analysis to transform these complex prediction models into pictures representing a topological view.
The methods scale up to large datasets across different domains and enable us to detect labeling errors in training data, understand generalization in image classification, and inspect predictions of likely pathogenic mutations in the BRCA1 gene.
arXiv Detail & Related papers (2022-07-28T19:28:05Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Gaussian Process Regression with Local Explanation [28.90948136731314]
We propose GPR with local explanation, which reveals the feature contributions to the prediction of each sample.
In the proposed model, both the prediction and explanation for each sample are performed using an easy-to-interpret locally linear model.
For a new test sample, the proposed model can predict the values of its target variable and weight vector, as well as their uncertainties.
arXiv Detail & Related papers (2020-07-03T13:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.