On the Computational Complexity of Private High-dimensional Model Selection
- URL: http://arxiv.org/abs/2310.07852v5
- Date: Tue, 29 Oct 2024 05:55:20 GMT
- Title: On the Computational Complexity of Private High-dimensional Model Selection
- Authors: Saptarshi Roy, Zehua Wang, Ambuj Tewari,
- Abstract summary: We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints.
We propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys mixing time to its stationary distribution.
- Score: 18.964255744068122
- License:
- Abstract: We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private (DP) best subset selection method with strong statistical utility properties by adopting the well-known exponential mechanism for selecting the best model. To achieve computational expediency, we propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys polynomial mixing time to its stationary distribution. As a result, we also establish both approximate differential privacy and statistical utility for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments on simulated data showing that our algorithm can quickly identify active features under reasonable privacy budget constraints.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z) - Regression with Label Differential Privacy [64.21020761920322]
We derive a label DP randomization mechanism that is optimal under a given regression loss function.
We prove that the optimal mechanism takes the form of a "randomized response on bins"
arXiv Detail & Related papers (2022-12-12T17:41:32Z) - Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies [4.913248451323163]
We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
arXiv Detail & Related papers (2022-09-29T03:36:56Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - High-Dimensional Differentially-Private EM Algorithm: Methods and
Near-Optimal Statistical Guarantees [8.089708900273804]
We develop a general framework to design differentially private expectation-maximization (EM) algorithms in high-dimensional latent variable models.
In each model, we establish the near-optimal rate of convergence with differential privacy constraints.
We propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting.
arXiv Detail & Related papers (2021-04-01T04:08:34Z) - On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity.
We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost.
Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.