Correcting for Selection Bias and Missing Response in Regression using
Privileged Information
- URL: http://arxiv.org/abs/2303.16800v2
- Date: Mon, 12 Jun 2023 08:34:05 GMT
- Title: Correcting for Selection Bias and Missing Response in Regression using
Privileged Information
- Authors: Philip Boeken, Noud de Kroon, Mathijs de Jong, Joris M. Mooij, Onno
Zoeter
- Abstract summary: We propose a novel imputation-based regression method, named repeated regression, that is suitable for Privilegedly Missing at Random (PMAR)
We empirically assess the performance of the proposed methods with extensive simulated experiments and on a synthetically augmented real-world dataset.
- Score: 1.5049442691806052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When estimating a regression model, we might have data where some labels are
missing, or our data might be biased by a selection mechanism. When the
response or selection mechanism is ignorable (i.e., independent of the response
variable given the features) one can use off-the-shelf regression methods; in
the nonignorable case one typically has to adjust for bias. We observe that
privileged information (i.e. information that is only available during
training) might render a nonignorable selection mechanism ignorable, and we
refer to this scenario as Privilegedly Missing at Random (PMAR). We propose a
novel imputation-based regression method, named repeated regression, that is
suitable for PMAR. We also consider an importance weighted regression method,
and a doubly robust combination of the two. The proposed methods are easy to
implement with most popular out-of-the-box regression algorithms. We
empirically assess the performance of the proposed methods with extensive
simulated experiments and on a synthetically augmented real-world dataset. We
conclude that repeated regression can appropriately correct for bias, and can
have considerable advantage over weighted regression, especially when
extrapolating to regions of the feature space where response is never observed.
Related papers
- Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization [60.176008034221404]
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.
Prior work has observed that the likelihood of preferred responses often decreases during training.
We demonstrate that likelihood displacement can be catastrophic, shifting probability mass from preferred responses to responses with an opposite meaning.
arXiv Detail & Related papers (2024-10-11T14:22:44Z) - Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset.
We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z) - Model Agnostic Explainable Selective Regression via Uncertainty
Estimation [15.331332191290727]
This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation.
Our proposed framework showcases superior performance compared to state-of-the-art selective regressors.
We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-11-15T17:40:48Z) - Engression: Extrapolation through the Lens of Distributional Regression [2.519266955671697]
We propose a neural network-based distributional regression methodology called engression'
An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes.
We show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions.
arXiv Detail & Related papers (2023-07-03T08:19:00Z) - Deep Regression Unlearning [6.884272840652062]
We introduce deep regression unlearning methods that generalize well and are robust to privacy attacks.
We conduct regression unlearning experiments for computer vision, natural language processing and forecasting applications.
arXiv Detail & Related papers (2022-10-15T05:00:20Z) - High-dimensional regression with potential prior information on variable
importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering.
We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression.
We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z) - Human Pose Regression with Residual Log-likelihood Estimation [48.30425850653223]
We propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process.
Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead.
arXiv Detail & Related papers (2021-07-23T15:06:31Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.