Related papers: What Makes A Good Fisherman? Linear Regression under Self-Selection Bias

What Makes A Good Fisherman? Linear Regression under Self-Selection Bias

URL: http://arxiv.org/abs/2205.03246v1
Date: Fri, 6 May 2022 14:03:05 GMT
Title: What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas, Manolis Zampetakis
Abstract summary: In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x(i), y(i))$. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear.
Score: 32.6588421908864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.

Related papers

Supervised Models Can Generalize Also When Trained on Random Labels [11.11462289882034]
unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$.<n>We formulate the model as a smoother, i.e. on the form $hatf=Sy$, and to construct the smoother matrix $S$ independently of $y$.<n>We demonstrate that $y$-free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$-based, versions and, most importantly, significantly better than random guessing.
arXiv Detail & Related papers (2025-05-16T08:51:44Z)
Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
Agnostic Active Learning of Single Index Models with Linear Sample Complexity [27.065175036001246]
We study active learning methods for single index models of the form $F(mathbf x) = f(langle mathbf w, mathbf xrangle)$.
arXiv Detail & Related papers (2024-05-15T13:11:28Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
On Model Selection Consistency of Lasso for High-Dimensional Ising Models on Tree-like Graphs [13.14903445595385]
We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso) It is rigorously proved that consistent model selection can be achieved with sample sizes $n=Omega(d3logp)$ for any tree-like graph in the paramagnetic phase. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
arXiv Detail & Related papers (2021-10-16T07:23:02Z)
Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression [13.14903445595385]
We show that despite model misspecification, the $ell_1$-regularized linear regression ($ell_1$-LinR) estimator can successfully recover the graph structure of the Ising model with $N$ variables. We also provide a computationally efficient method to accurately predict the non-asymptotic performance of the $ell_1$-LinR estimator with moderate $M$ and $N$.
arXiv Detail & Related papers (2021-02-08T03:45:10Z)
Estimating Stochastic Linear Combination of Non-linear Regressions Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much. To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.