Truthful Generalized Linear Models
- URL: http://arxiv.org/abs/2209.07815v1
- Date: Fri, 16 Sep 2022 09:28:08 GMT
- Title: Truthful Generalized Linear Models
- Authors: Yuan Qiu and Jinyan Liu and Di Wang
- Abstract summary: We study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested.
By using an $ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.
- Score: 9.766078137988936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we study estimating Generalized Linear Models (GLMs) in the
case where the agents (individuals) are strategic or self-interested and they
concern about their privacy when reporting data. Compared with the classical
setting, here we aim to design mechanisms that can both incentivize most agents
to truthfully report their data and preserve the privacy of individuals'
reports, while their outputs should also close to the underlying parameter. In
the first part of the paper, we consider the case where the covariates are
sub-Gaussian and the responses are heavy-tailed where they only have the finite
fourth moments. First, motivated by the stationary condition of the maximizer
of the likelihood function, we derive a novel private and closed form
estimator. Based on the estimator, we propose a mechanism which has the
following properties via some appropriate design of the computation and payment
scheme for several canonical models such as linear regression, logistic
regression and Poisson regression: (1) the mechanism is $o(1)$-jointly
differentially private (with probability at least $1-o(1)$); (2) it is an
$o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction
of agents to truthfully report their data, where $n$ is the number of agents;
(3) the output could achieve an error of $o(1)$ to the underlying parameter;
(4) it is individually rational for a $(1-o(1))$ fraction of agents in the
mechanism ; (5) the payment budget required from the analyst to run the
mechanism is $o(1)$. In the second part, we consider the linear regression
model under more general setting where both covariates and responses are
heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm
shrinkage operator, we propose a private estimator and payment scheme which
have similar properties as in the sub-Gaussian case.
Related papers
- Beyond Laplace and Gaussian: Exploring the Generalized Gaussian Mechanism for Private Machine Learning [49.66162382667325]
We investigate the Generalized Gaussian mechanism, which samples the additive noise term $x$ with probability proportional to $e-frac| x |sigmabeta $ for some $beta geq 1$.<n>We show that privacy accounting for the GG Mechanism and its variants is independent, which substantially improves computational costs of privacy accounting.
arXiv Detail & Related papers (2025-06-14T15:49:25Z) - Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.
Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.
No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z) - Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution [11.371461065112422]
We study a collaborative learning problem where $m$ agents aim to estimate a vector $mu =(mu_k, sigma2)_kin[d]$.<n>Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error.
arXiv Detail & Related papers (2024-07-20T17:45:40Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - PRIMO: Private Regression in Multiple Outcomes [4.111899441919164]
We introduce a new differentially private regression setting we call Private Regression in Multiple Outcomes.
In Subsection $4.1$ we modify techniques based on sufficient statistics perturbation (SSP) to yield greatly improved dependence on $l$.
In Section $5$ we apply our algorithms to the task of private genomic risk prediction for multiple phenotypes using data from the 1000 Genomes project.
arXiv Detail & Related papers (2023-03-07T19:32:13Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression [13.14903445595385]
We show that despite model misspecification, the $ell_1$-regularized linear regression ($ell_1$-LinR) estimator can successfully recover the graph structure of the Ising model with $N$ variables.
We also provide a computationally efficient method to accurately predict the non-asymptotic performance of the $ell_1$-LinR estimator with moderate $M$ and $N$.
arXiv Detail & Related papers (2021-02-08T03:45:10Z) - Strongly universally consistent nonparametric regression and
classification with privatised data [2.879036956042183]
We revisit the classical problem of nonparametric regression, but impose local differential privacy constraints.
We design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator.
arXiv Detail & Related papers (2020-10-31T09:00:43Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.