Learning generative models for valid knockoffs using novel
multivariate-rank based statistics
- URL: http://arxiv.org/abs/2111.00043v1
- Date: Fri, 29 Oct 2021 18:51:19 GMT
- Title: Learning generative models for valid knockoffs using novel
multivariate-rank based statistics
- Authors: Shoaib Bin Masud, Shuchin Aeron
- Abstract summary: Rank energy (RE) is derived using theoretical results characterizing the optimal maps in the Monge's Optimal Transport (OT) problem.
We propose a variant of the RE, dubbed as soft rank energy (sRE), and its kernel variant called as soft rank maximum mean discrepancy (sRMMD)
We then use sRMMD to generate deep knockoffs and show via extensive evaluation that it is a novel and effective method to produce valid knockoffs.
- Score: 12.528602250193206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of generating valid knockoffs for knockoff filtering
which is a statistical method that provides provable false discovery rate
guarantees for any model selection procedure. To this end, we are motivated by
recent advances in multivariate distribution-free goodness-of-fit tests namely,
the rank energy (RE), that is derived using theoretical results characterizing
the optimal maps in the Monge's Optimal Transport (OT) problem. However, direct
use of use RE for learning generative models is not feasible because of its
high computational and sample complexity, saturation under large support
discrepancy between distributions, and non-differentiability in generative
parameters. To alleviate these, we begin by proposing a variant of the RE,
dubbed as soft rank energy (sRE), and its kernel variant called as soft rank
maximum mean discrepancy (sRMMD) using entropic regularization of Monge's OT
problem. We then use sRMMD to generate deep knockoffs and show via extensive
evaluation that it is a novel and effective method to produce valid knockoffs,
achieving comparable, or in some cases improved tradeoffs between detection
power Vs false discoveries.
Related papers
- DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting [14.390842560217743]
We propose a novel approach called DistPred for regression and forecasting tasks.
We transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form.
This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable.
arXiv Detail & Related papers (2024-06-17T10:33:00Z) - DeepDRK: Deep Dependency Regularized Knockoff for Feature Selection [14.840211139848275]
"Deep Dependency Regularized Knockoff (DeepDRK)" is a distribution-free deep learning method that effectively balances FDR and power.
We introduce a novel formulation of the knockoff model as a learning problem under multi-source adversarial attacks.
Our model outperforms existing benchmarks across synthetic, semi-synthetic, and real-world datasets.
arXiv Detail & Related papers (2024-02-27T03:24:54Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control.
Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme.
We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z) - Reducing the Amortization Gap in Variational Autoencoders: A Bayesian
Random Function Approach [38.45568741734893]
Inference in our GP model is done by a single feed forward pass through the network, significantly faster than semi-amortized methods.
We show that our approach attains higher test data likelihood than the state-of-the-arts on several benchmark datasets.
arXiv Detail & Related papers (2021-02-05T13:01:12Z) - DisARM: An Antithetic Gradient Estimator for Binary Latent Variables [35.473848208376886]
We introduce the Augment-REINFORCE-Merge (ARM) estimator for training models with binary latent variables.
We show that ARM can be improved by analytically integrating out the randomness introduced by the augmentation process.
Our estimator, DisARM, is simple to implement and has the same computational cost as ARM.
arXiv Detail & Related papers (2020-06-18T17:09:35Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.