Related papers: RIGID: Robust Linear Regression with Missing Data

RIGID: Robust Linear Regression with Missing Data

URL: http://arxiv.org/abs/2205.13635v1
Date: Thu, 26 May 2022 21:10:17 GMT
Title: RIGID: Robust Linear Regression with Missing Data
Authors: Alireza Aghasi, MohammadJavad Feizollahi, Saeed Ghadimi
Abstract summary: We present a robust framework to perform linear regression with missing entries in the features. We show that the proposed formulation, which naturally takes into account the dependency between different variables, reduces to a convex program. In addition to a detailed analysis, we also analyze the behavior of the proposed framework, and present technical discussions.
Score: 7.638042073679073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a robust framework to perform linear regression with missing entries in the features. By considering an elliptical data distribution, and specifically a multivariate normal model, we are able to conditionally formulate a distribution for the missing entries and present a robust framework, which minimizes the worst case error caused by the uncertainty about the missing data. We show that the proposed formulation, which naturally takes into account the dependency between different variables, ultimately reduces to a convex program, for which a customized and scalable solver can be delivered. In addition to a detailed analysis to deliver such solver, we also asymptoticly analyze the behavior of the proposed framework, and present technical discussions to estimate the required input parameters. We complement our analysis with experiments performed on synthetic, semi-synthetic, and real data, and show how the proposed formulation improves the prediction accuracy and robustness, and outperforms the competing techniques.

Related papers

Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution. Our contribution is to provide the first general estimation technique for transportability problems. We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Probabilistic Iterative Hard Thresholding for Sparse Learning [2.5782973781085383]
We present an approach towards solving expectation objective optimization problems with cardinality constraints. We prove convergence of the underlying process, and demonstrate the performance on two Machine Learning problems.
arXiv Detail & Related papers (2024-09-02T18:14:45Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems. We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z)
An Inexact Halpern Iteration with Application to Distributionally Robust Optimization [9.529117276663431]
We investigate the inexact variants of the scheme in both deterministic and deterministic convergence settings. We show that by choosing the inexactness appropriately, the inexact schemes admit an $O(k-1) convergence rate in terms of the (expected) residue norm.
arXiv Detail & Related papers (2024-02-08T20:12:47Z)
Robust Regression over Averaged Uncertainty [7.4489490661717355]
We show that this formulation recovers ridge regression exactly and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We provide exact, closed-form, in some cases, analytical solutions to the equivalent regularization strength under uncertainty sets induced by $ell_p$ norm, Schatten $p$-norm, and general polytopes.
arXiv Detail & Related papers (2023-11-12T20:57:30Z)
Data-Driven Sample Average Approximation with Covariate Information [0.0]
We study optimization for data-driven decision-making when we have observations of the uncertain parameters within the optimization model together with concurrent observations of coparametrics. We investigate three data-driven frameworks that integrate a machine learning prediction model within a programming sample average approximation (SAA) for approximating the solution to this problem.
arXiv Detail & Related papers (2022-07-27T14:45:04Z)
Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured. We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Robust Bayesian Inference for Discrete Outcomes with the Total Variation Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD) We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.