論文の概要: What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
- arxiv url: http://arxiv.org/abs/2205.03246v1
- Date: Fri, 6 May 2022 14:03:05 GMT
- ステータス: 処理完了
- システム内更新日: 2022-05-09 13:13:18.027846
- Title: What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
- Title(参考訳): 何が良い漁師になるのか?
- Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas,
Manolis Zampetakis
- Abstract要約: 古典的な自己選択の設定では、ゴールは、観測値$(x(i), y(i))$から同時に$k$モデルを学ぶことである。
- 参考スコア(独自算出の注目度): 32.6588421908864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the classical setting of self-selection, the goal is to learn $k$ models,
simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the
output of one of $k$ underlying models on input $x^{(i)}$. In contrast to
mixture models, where we observe the output of a randomly selected model, here
the observed model depends on the outputs themselves, and is determined by some
known selection criterion. For example, we might observe the highest output,
the smallest output, or the median output of the $k$ models. In known-index
self-selection, the identity of the observed model output is observable; in
unknown-index self-selection, it is not. Self-selection has a long history in
Econometrics and applications in various theoretical and applied fields,
including treatment effect estimation, imitation learning, learning from
strategically reported data, and learning from markets at disequilibrium.
In this work, we present the first computationally and statistically
efficient estimation algorithms for the most standard setting of this problem
where the models are linear. In the known-index case, we require
poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model
parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate
quite general selection criteria. In the more challenging unknown-index case,
even the identifiability of the linear models (from infinitely many samples)
was not known. We show three results in this case for the commonly studied
$\max$ self-selection criterion: (1) we show that the linear models are indeed
identifiable, (2) for general $k$ we provide an algorithm with poly$(d)
\exp(\text{poly}(k))$ sample and time complexity to estimate the regression
parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an
algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and
time complexity.
- Abstract(参考訳): 古典的な自己選択の場合、目標は、観察値の$(x^{(i)}, y^{(i)})$から同時に$k$モデルを学習することであり、ここで$y^{(i)}$は入力の$x^{(i)}$上の$k$モデルの出力である。
既知のインデックスの場合、すべてのモデルパラメータを正確に推定するために、poly$(1/\varepsilon, k, d)$サンプルと時間複雑さが必要であり、非常に一般的な選択基準を満たすことができる。
1) 線形モデルが真に識別可能であること、(2) 一般の $k$ に対して poly$(d) \exp(\text{poly}(k))$ による回帰パラメータを誤差1/\text{poly}(k)$, (3) $k = 2$ 任意の誤差$\varepsilon$ と poly$(d, 1/\varepsilon)$ のサンプルと時間の複雑さを推定するためのアルゴリズムを提供する。
