Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous
Missingness
- URL: http://arxiv.org/abs/2402.03954v1
- Date: Tue, 6 Feb 2024 12:26:58 GMT
- Title: Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous
Missingness
- Authors: Xiaojun Mao, Hengfang Wang, Zhonglei Wang and Shu Yang
- Abstract summary: We propose a fast and scalable estimation algorithm that achieves sublinear convergence.
The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.
- Score: 6.278498348219109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern surveys with large sample sizes and growing mixed-type questionnaires
require robust and scalable analysis methods. In this work, we consider
recovering a mixed dataframe matrix, obtained by complex survey sampling, with
entries following different canonical exponential distributions and subject to
heterogeneous missingness. To tackle this challenging task, we propose a
two-stage procedure: in the first stage, we model the entry-wise missing
mechanism by logistic regression, and in the second stage, we complete the
target parameter matrix by maximizing a weighted log-likelihood with a low-rank
constraint. We propose a fast and scalable estimation algorithm that achieves
sublinear convergence, and the upper bound for the estimation error of the
proposed method is rigorously derived. Experimental results support our
theoretical claims, and the proposed estimator shows its merits compared to
other existing methods. The proposed method is applied to analyze the National
Health and Nutrition Examination Survey data.
Related papers
- Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - Estimation of multiple mean vectors in high dimension [4.2466572124753]
We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples.
Our approach involves forming estimators through convex combinations of empirical means derived from these samples.
arXiv Detail & Related papers (2024-03-22T08:42:41Z) - Doubly Robust Inference in Causal Latent Factor Models [12.116813197164047]
This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes.
We derive finite-sample weighting and guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate.
arXiv Detail & Related papers (2024-02-18T17:13:46Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - A Generalized Latent Factor Model Approach to Mixed-data Matrix
Completion with Entrywise Consistency [3.299672391663527]
Matrix completion is a class of machine learning methods that concerns the prediction of missing entries in a partially observed matrix.
We formulate it as a low-rank matrix estimation problem under a general family of non-linear factor models.
We propose entrywise consistent estimators for estimating the low-rank matrix.
arXiv Detail & Related papers (2022-11-17T00:24:47Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - RIGID: Robust Linear Regression with Missing Data [7.638042073679073]
We present a robust framework to perform linear regression with missing entries in the features.
We show that the proposed formulation, which naturally takes into account the dependency between different variables, reduces to a convex program.
In addition to a detailed analysis, we also analyze the behavior of the proposed framework, and present technical discussions.
arXiv Detail & Related papers (2022-05-26T21:10:17Z) - Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models.
We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample.
In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z) - Robust Matrix Completion with Mixed Data Types [0.0]
We consider the problem of recovering a structured low rank matrix with partially observed entries with mixed data types.
Most approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm.
We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step.
arXiv Detail & Related papers (2020-05-25T21:35:10Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.