The Informed Elastic Net for Fast Grouped Variable Selection and FDR Control in Genomics Research
- URL: http://arxiv.org/abs/2410.05211v1
- Date: Mon, 7 Oct 2024 17:18:25 GMT
- Title: The Informed Elastic Net for Fast Grouped Variable Selection and FDR Control in Genomics Research
- Authors: Jasin Machkour, Michael Muma, Daniel P. Palomar,
- Abstract summary: We propose a new base selector that significantly reduces computation time while retaining the grouped variable selection property.
The proposed T-Rex+GVS (IEN) exhibits the desired grouping effect, reduces time, and achieves the same TPR as T-Rex+GVS (EN) but with lower FDR.
- Score: 9.6703621796624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern genomics research relies on genome-wide association studies (GWAS) to identify the few genetic variants among potentially millions that are associated with diseases of interest. Only reproducible discoveries of groups of associations improve our understanding of complex polygenic diseases and enable the development of new drugs and personalized medicine. Thus, fast multivariate variable selection methods that have a high true positive rate (TPR) while controlling the false discovery rate (FDR) are crucial. Recently, the T-Rex+GVS selector, a version of the T-Rex selector that uses the elastic net (EN) as a base selector to perform grouped variable election, was proposed. Although it significantly increased the TPR in simulated GWAS compared to the original T-Rex, its comparably high computational cost limits scalability. Therefore, we propose the informed elastic net (IEN), a new base selector that significantly reduces computation time while retaining the grouped variable selection property. We quantify its grouping effect and derive its formulation as a Lasso-type optimization problem, which is solved efficiently within the T-Rex framework by the terminated LARS algorithm. Numerical simulations and a GWAS study demonstrate that the proposed T-Rex+GVS (IEN) exhibits the desired grouping effect, reduces computation time, and achieves the same TPR as T-Rex+GVS (EN) but with lower FDR, which makes it a promising method for large-scale GWAS.
Related papers
- G-computation for increasing performances of clinical trials with individual randomization and binary response [0.43541492802373877]
In a clinical trial, the random allocation aims to balance prognostic factors between arms, preventing true confounders.
Adjusting on prognostic factors is therefore recommended, especially because the related increase of the power.
In this paper, we hypothesized that G-computation associated with machine learning could be a suitable method for randomized clinical trials even with small sample sizes.
arXiv Detail & Related papers (2024-11-15T10:18:38Z) - Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop [1.5948860527881505]
T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables.
We propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption.
We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes.
arXiv Detail & Related papers (2024-09-27T18:38:51Z) - Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups.
We reformulate the group DRO framework by proposing Q-Diversity.
Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z) - VQ-T: RNN Transducers using Vector-Quantized Prediction Network States [52.48566999668521]
We propose to use vector-quantized long short-term memory units in the prediction network of RNN transducers.
By training the discrete representation jointly with the ASR network, hypotheses can be actively merged for lattice generation.
Our experiments on the Switchboard corpus show that the proposed VQ RNN transducers improve ASR performance over transducers with regular prediction networks.
arXiv Detail & Related papers (2022-08-03T02:45:52Z) - Effective Mutation Rate Adaptation through Group Elite Selection [50.88204196504888]
This paper introduces the Group Elite Selection of Mutation Rates (GESMR) algorithm.
GESMR co-evolves a population of solutions and a population of MRs, such that each MR is assigned to a group of solutions.
With the same number of function evaluations and with almost no overhead, GESMR converges faster and to better solutions than previous approaches.
arXiv Detail & Related papers (2022-04-11T01:08:26Z) - The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control [10.86851797584794]
T-Rex selector controls a user-defined target false discovery rate (FDR)
Experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors.
arXiv Detail & Related papers (2021-10-12T14:52:46Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.