What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
- URL: http://arxiv.org/abs/2205.03246v1
- Date: Fri, 6 May 2022 14:03:05 GMT
- Title: What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
- Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas,
Manolis Zampetakis
- Abstract summary: In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x(i), y(i))$.
In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear.
- Score: 32.6588421908864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the classical setting of self-selection, the goal is to learn $k$ models,
simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the
output of one of $k$ underlying models on input $x^{(i)}$. In contrast to
mixture models, where we observe the output of a randomly selected model, here
the observed model depends on the outputs themselves, and is determined by some
known selection criterion. For example, we might observe the highest output,
the smallest output, or the median output of the $k$ models. In known-index
self-selection, the identity of the observed model output is observable; in
unknown-index self-selection, it is not. Self-selection has a long history in
Econometrics and applications in various theoretical and applied fields,
including treatment effect estimation, imitation learning, learning from
strategically reported data, and learning from markets at disequilibrium.
In this work, we present the first computationally and statistically
efficient estimation algorithms for the most standard setting of this problem
where the models are linear. In the known-index case, we require
poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model
parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate
quite general selection criteria. In the more challenging unknown-index case,
even the identifiability of the linear models (from infinitely many samples)
was not known. We show three results in this case for the commonly studied
$\max$ self-selection criterion: (1) we show that the linear models are indeed
identifiable, (2) for general $k$ we provide an algorithm with poly$(d)
\exp(\text{poly}(k))$ sample and time complexity to estimate the regression
parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an
algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and
time complexity.
Related papers
- Agnostic Active Learning of Single Index Models with Linear Sample Complexity [27.065175036001246]
We study active learning methods for single index models of the form $F(mathbf x) = f(langle mathbf w, mathbf xrangle)$.
arXiv Detail & Related papers (2024-05-15T13:11:28Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - On Model Selection Consistency of Lasso for High-Dimensional Ising
Models on Tree-like Graphs [13.14903445595385]
We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso)
It is rigorously proved that consistent model selection can be achieved with sample sizes $n=Omega(d3logp)$ for any tree-like graph in the paramagnetic phase.
Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
arXiv Detail & Related papers (2021-10-16T07:23:02Z) - Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression [13.14903445595385]
We show that despite model misspecification, the $ell_1$-regularized linear regression ($ell_1$-LinR) estimator can successfully recover the graph structure of the Ising model with $N$ variables.
We also provide a computationally efficient method to accurately predict the non-asymptotic performance of the $ell_1$-LinR estimator with moderate $M$ and $N$.
arXiv Detail & Related papers (2021-02-08T03:45:10Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.