Related papers: Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

URL: http://arxiv.org/abs/2011.14185v2
Date: Sat, 18 Mar 2023 08:01:26 GMT
Title: Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning
Authors: Siyi Deng, Yang Ning, Jiwei Zhao, Heping Zhang
Abstract summary: We consider the estimation problem in high-dimensional semi-supervised learning. We first establish the minimax lower bound for parameter estimation in the semi-supervised setting. We propose an optimal semi-supervised estimator that can attain this lower bound.
Score: 4.4102422716568235
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider the estimation problem in high-dimensional semi-supervised learning. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation of the regression parameters of linear model in light of the fact that such linear models may be misspecified in data analysis. We first establish the minimax lower bound for parameter estimation in the semi-supervised setting, and show that this lower bound cannot be achieved by supervised estimators using the labeled data only. We propose an optimal semi-supervised estimator that can attain this lower bound and therefore improves the supervised estimators, provided that the conditional mean function can be consistently estimated with a proper rate. We further propose a safe semi-supervised estimator. We view it safe, because this estimator is always at least as good as the supervised estimators. We also extend our idea to the aggregation of multiple semi-supervised estimators caused by different misspecifications of the conditional mean function. Extensive numerical simulations and a real data analysis are conducted to illustrate our theoretical results.

Related papers

Deep Partially Linear Transformation Model for Right-Censored Survival Data [9.991327369572819]
This paper introduces a deep partially linear transformation model (DPLTM) as a general and flexible framework for estimation, inference and prediction. Comprehensive simulation studies demonstrate the impressive performance of the proposed estimation procedure in terms of both estimation accuracy and prediction power.
arXiv Detail & Related papers (2024-12-10T15:50:43Z)
Heavy-tailed Contamination is Easier than Adversarial Contamination [8.607294463464523]
A body of work in the statistics and computer science communities dating back to Huber (Huber, 1960) has led to statistically and computationally efficient outlier-robust estimators. Two particular outlier models have received significant attention: the adversarial and heavy-tailed models.
arXiv Detail & Related papers (2024-11-22T19:00:33Z)
Debiased Regression for Root-N-Consistent Conditional Mean Estimation [10.470114319701576]
We introduce a debiasing method for regression estimators, including high-dimensional and nonparametric regression estimators. Our theoretical analysis demonstrates that the proposed estimator achieves $sqrtn$-consistency and normality under a mild convergence rate condition. The proposed method offers several advantages, including improved estimation accuracy and simplified construction of confidence intervals.
arXiv Detail & Related papers (2024-11-18T17:25:06Z)
Leveraging Variational Autoencoders for Parameterized MMSE Estimation [10.141454378473972]
We propose a variational autoencoder-based framework for parameterizing a conditional linear minimum mean squared error estimator. The derived estimator is shown to approximate the minimum mean squared error estimator by utilizing the variational autoencoder as a generative prior for the estimation problem. We conduct a rigorous analysis by bounding the difference between the proposed and the minimum mean squared error estimator.
arXiv Detail & Related papers (2023-07-11T15:41:34Z)
Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings [0.5735035463793009]
We consider quantile estimation in a semi-supervised setting, characterized by two available data sets. We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets.
arXiv Detail & Related papers (2022-01-25T10:02:23Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution. We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models. Standard metrics calculated from retrospective data are only related to model utility under certain assumptions. When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
Distributional robustness of K-class estimators and the PULSE [4.56877715768796]
We prove that the classical K-class estimator satisfies such optimality by establishing a connection between K-class estimators and anchor regression. We show that it can be computed efficiently as a data-driven simulation K-class estimator. There are several settings including weak instrument settings, where it outperforms other estimators.
arXiv Detail & Related papers (2020-05-07T09:39:07Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.