Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
- URL: http://arxiv.org/abs/2501.17889v1
- Date: Tue, 28 Jan 2025 09:27:04 GMT
- Title: Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
- Authors: Xiaochen Zhang, Yunfeng Cai, Haoyi Xiong,
- Abstract summary: This work introduces a novel approach namely Knockoff with over- parameterization (Knoop) to enhance variable selection.<n>Knoop generates multiple knockoff variables for each original variable and integrates them with the original variables into a Ridgeless regression model.<n>Experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets.
- Score: 27.563529091471935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.
Related papers
- Mean-field Variational Bayes for Sparse Probit Regression [0.9023847175654603]
We consider Bayesian variable selection for binary outcomes under a probit link with a spike-and-slab prior to the regression coefficients.<n>Motivated by the computational challenges encountered by Markov chain Monte Carlo samplers in high-dimensional regimes, we develop a mean-field variational Bayes approximation.
arXiv Detail & Related papers (2026-01-29T14:16:31Z) - Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Variational Garrote for Statistical Physics-based Sparse and Robust Variable Selection [8.312621461460148]
We revisit the statistical physics-based Variational Garrote (VG) method, which introduces explicit feature selection spin variables.<n>We evaluate VG on both fully controllable synthetic datasets and complex real-world datasets.<n> VG offers strong potential for sparse modeling across a wide range of applications, including compressed sensing and model pruning in machine learning.
arXiv Detail & Related papers (2025-09-08T07:06:10Z) - Diffusion-Driven High-Dimensional Variable Selection [6.993247097440294]
We propose a resample-aggregate framework that exploits diffusion models' ability to generate high-fidelity synthetic data.<n>We show that the proposed method is selection consistent under mild assumptions.<n>Our method advances variable selection methodology and broadens the toolkit for interpretable, statistically rigorous analysis.
arXiv Detail & Related papers (2025-08-19T14:54:20Z) - Model-independent variable selection via the rule-based variable priority [1.2771542695459488]
We introduce a new model-independent approach, Variable Priority (VarPro)
VarPro works by utilizing rules without the need to generate artificial data or evaluate prediction error.
We show that VarPro has a consistent filtering property for noise variables.
arXiv Detail & Related papers (2024-09-13T17:32:05Z) - Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Data-driven path collective variables [0.0]
We propose a new method for the generation, optimization, and comparison of collective variables.
The resulting collective variable is one-dimensional, interpretable, and differentiable.
We demonstrate the validity of the method on two different applications.
arXiv Detail & Related papers (2023-12-21T14:07:47Z) - Effect of hyperparameters on variable selection in random forests [0.0]
We evaluate the effects on the Vita and Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data.<n>For weakly correlated predictor variables, the default value of the number of splitting variables is optimal, but smaller values of the sample fraction result in larger sensitivity.
arXiv Detail & Related papers (2023-09-13T13:26:10Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Uncertainty Inspired RGB-D Saliency Detection [70.50583438784571]
We propose the first framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection.
Results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps.
arXiv Detail & Related papers (2020-09-07T13:01:45Z) - Variable selection for Gaussian process regression through a sparse
projection [0.802904964931021]
This paper presents a new variable selection approach integrated with Gaussian process (GP) regression.
The choice of tuning parameters and the accuracy of the estimation are evaluated with the simulation some chosen benchmark approaches.
arXiv Detail & Related papers (2020-08-25T01:06:10Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.