Optimal and Safe Estimation for High-Dimensional Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2011.14185v2
- Date: Sat, 18 Mar 2023 08:01:26 GMT
- Title: Optimal and Safe Estimation for High-Dimensional Semi-Supervised
Learning
- Authors: Siyi Deng, Yang Ning, Jiwei Zhao, Heping Zhang
- Abstract summary: We consider the estimation problem in high-dimensional semi-supervised learning.
We first establish the minimax lower bound for parameter estimation in the semi-supervised setting.
We propose an optimal semi-supervised estimator that can attain this lower bound.
- Score: 4.4102422716568235
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We consider the estimation problem in high-dimensional semi-supervised
learning. Our goal is to investigate when and how the unlabeled data can be
exploited to improve the estimation of the regression parameters of linear
model in light of the fact that such linear models may be misspecified in data
analysis. We first establish the minimax lower bound for parameter estimation
in the semi-supervised setting, and show that this lower bound cannot be
achieved by supervised estimators using the labeled data only. We propose an
optimal semi-supervised estimator that can attain this lower bound and
therefore improves the supervised estimators, provided that the conditional
mean function can be consistently estimated with a proper rate. We further
propose a safe semi-supervised estimator. We view it safe, because this
estimator is always at least as good as the supervised estimators. We also
extend our idea to the aggregation of multiple semi-supervised estimators
caused by different misspecifications of the conditional mean function.
Extensive numerical simulations and a real data analysis are conducted to
illustrate our theoretical results.
Related papers
- Precise Model Benchmarking with Only a Few Observations [6.092112060364272]
We propose an empirical Bayes (EB) estimator that balances direct and regression estimates for each subgroup separately.
EB consistently provides more precise estimates of the LLM performance compared to the direct and regression approaches.
arXiv Detail & Related papers (2024-10-07T17:26:31Z) - Leveraging Variational Autoencoders for Parameterized MMSE Estimation [10.141454378473972]
We propose a variational autoencoder-based framework for parameterizing a conditional linear minimum mean squared error estimator.
The derived estimator is shown to approximate the minimum mean squared error estimator by utilizing the variational autoencoder as a generative prior for the estimation problem.
We conduct a rigorous analysis by bounding the difference between the proposed and the minimum mean squared error estimator.
arXiv Detail & Related papers (2023-07-11T15:41:34Z) - Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings [0.5735035463793009]
We consider quantile estimation in a semi-supervised setting, characterized by two available data sets.
We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets.
arXiv Detail & Related papers (2022-01-25T10:02:23Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Robust W-GAN-Based Estimation Under Wasserstein Contamination [8.87135311567798]
We study several estimation problems under a Wasserstein contamination model and present computationally tractable estimators motivated by generative networks (GANs)
Specifically, we analyze properties of Wasserstein GAN-based estimators for adversarial location estimation, covariance matrix estimation, and linear regression.
Our proposed estimators are minimax optimal in many scenarios.
arXiv Detail & Related papers (2021-01-20T05:15:16Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Distributional robustness of K-class estimators and the PULSE [4.56877715768796]
We prove that the classical K-class estimator satisfies such optimality by establishing a connection between K-class estimators and anchor regression.
We show that it can be computed efficiently as a data-driven simulation K-class estimator.
There are several settings including weak instrument settings, where it outperforms other estimators.
arXiv Detail & Related papers (2020-05-07T09:39:07Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.