On semi-supervised estimation using exponential tilt mixture models
- URL: http://arxiv.org/abs/2311.08504v1
- Date: Tue, 14 Nov 2023 19:53:26 GMT
- Title: On semi-supervised estimation using exponential tilt mixture models
- Authors: Ye Tian, Xinwei Zhang and Zhiqiang Tan
- Abstract summary: Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only predictors.
For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models.
- Score: 12.347498345854715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a semi-supervised setting with a labeled dataset of binary responses
and predictors and an unlabeled dataset with only the predictors. Logistic
regression is equivalent to an exponential tilt model in the labeled
population. For semi-supervised estimation, we develop further analysis and
understanding of a statistical approach using exponential tilt mixture (ETM)
models and maximum nonparametric likelihood estimation, while allowing that the
class proportions may differ between the unlabeled and labeled data. We derive
asymptotic properties of ETM-based estimation and demonstrate improved
efficiency over supervised logistic regression in a random sampling setup and
an outcome-stratified sampling setup previously used. Moreover, we reconcile
such efficiency improvement with the existing semiparametric efficiency theory
when the class proportions in the unlabeled and labeled data are restricted to
be the same. We also provide a simulation study to numerically illustrate our
theoretical findings.
Related papers
- Calibrating doubly-robust estimators with unbalanced treatment assignment [0.0]
We propose a simple extension of the DML estimator which undersamples data for propensity score modeling.
The paper provides theoretical results showing that the estimator retains the estimator's properties and calibrates scores to match the original distribution.
arXiv Detail & Related papers (2024-03-03T18:40:11Z) - A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem.
We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations.
Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z) - Efficient semi-supervised inference for logistic regression under
case-control studies [3.5485531932219243]
We consider an inference problem in semi-supervised settings where the outcome in the labeled data is binary.
Case-control sampling is an effective sampling scheme for alleviating imbalance structure in binary data.
We find out that with the availability of the unlabeled data, the intercept parameter can be identified in semi-supervised learning setting.
arXiv Detail & Related papers (2024-02-23T14:55:58Z) - Causal Effect Estimation from Observational and Interventional Data
Through Matrix Weighted Linear Estimators [11.384045395629123]
We study causal effect estimation from a mixture of observational and interventional data.
We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators.
arXiv Detail & Related papers (2023-06-09T16:16:53Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data [13.48481978963297]
Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
arXiv Detail & Related papers (2021-06-07T05:12:42Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.