The Lifecycle of a Statistical Model: Model Failure Detection,
Identification, and Refitting
- URL: http://arxiv.org/abs/2202.04166v1
- Date: Tue, 8 Feb 2022 22:02:31 GMT
- Title: The Lifecycle of a Statistical Model: Model Failure Detection,
Identification, and Refitting
- Authors: Alnur Ali, Maxime Cauchois, John C. Duchi
- Abstract summary: We develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade.
We present empirical results with three real-world data sets.
We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations.
- Score: 26.351782287953267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The statistical machine learning community has demonstrated considerable
resourcefulness over the years in developing highly expressive tools for
estimation, prediction, and inference. The bedrock assumptions underlying these
developments are that the data comes from a fixed population and displays
little heterogeneity. But reality is significantly more complex: statistical
models now routinely fail when released into real-world systems and scientific
applications, where such assumptions rarely hold. Consequently, we pursue a
different path in this paper vis-a-vis the well-worn trail of developing new
methodology for estimation and prediction. In this paper, we develop tools and
theory for detecting and identifying regions of the covariate space
(subpopulations) where model performance has begun to degrade, and study
intervening to fix these failures through refitting. We present empirical
results with three real-world data sets -- including a time series involving
forecasting the incidence of COVID-19 -- showing that our methodology generates
interpretable results, is useful for tracking model performance, and can boost
model performance through refitting. We complement these empirical results with
theory proving that our methodology is minimax optimal for recovering anomalous
subpopulations as well as refitting to improve accuracy in a structured normal
means setting.
Related papers
- On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Zero-Shot Uncertainty Quantification using Diffusion Probabilistic Models [7.136205674624813]
We conduct a study to evaluate the effectiveness of ensemble methods on solving different regression problems using diffusion models.
We demonstrate that ensemble methods consistently improve model prediction accuracy across various regression tasks.
Our study provides a comprehensive view of the utility of diffusion ensembles, serving as a useful reference for practitioners employing diffusion models in regression problem-solving.
arXiv Detail & Related papers (2024-08-08T18:34:52Z) - Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications [0.0]
This study explores model adaptation and generalization by utilizing synthetic data.
We employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity.
Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty.
arXiv Detail & Related papers (2024-05-03T10:05:31Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Causality and Generalizability: Identifiability and Learning Methods [0.0]
This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust prediction methods.
We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization.
We propose a general framework for distributional robustness with respect to intervention-induced distributions.
arXiv Detail & Related papers (2021-10-04T13:12:11Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z) - A comprehensive study on the prediction reliability of graph neural
networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results.
Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.