Related papers: Minimax Optimal Estimation of Stability Under Distribution Shift

Minimax Optimal Estimation of Stability Under Distribution Shift

URL: http://arxiv.org/abs/2212.06338v2
Date: Tue, 25 Jun 2024 02:21:54 GMT
Title: Minimax Optimal Estimation of Stability Under Distribution Shift
Authors: Hongseok Namkoong, Yuanzhe Ma, Peter W. Glynn,
Abstract summary: We analyze the stability of a system under distribution shift. The stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost.
Score: 8.893526921869137
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.

Related papers

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z)
Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention [64.12447263206381]
We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a label-free method that estimates performance changes by modeling gradual shifts using optimal transport.<n>IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget.<n>Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios.
arXiv Detail & Related papers (2025-05-11T15:35:55Z)
Probabilistic Stability Guarantees for Feature Attributions [20.58023369482214]
We propose a model-agnostic, sample-efficient stability certification algorithm (SCA) that yields non-trivial and interpretable guarantees for attribution methods.<n>We show that mild smoothing achieves a more favorable trade-off between accuracy and stability, avoiding the aggressive compromises made in prior certification methods.
arXiv Detail & Related papers (2025-04-18T16:39:08Z)
On the Selection Stability of Stability Selection and Its Applications [2.263635133348731]
This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the outcomes obtained and help identify an optimal regularization value to improve stability.
arXiv Detail & Related papers (2024-11-14T00:02:54Z)
Stability Evaluation via Distributional Perturbation Analysis [28.379994938809133]
We propose a stability evaluation criterion based on distributional perturbations. Our stability evaluation criterion can address both emphdata corruptions and emphsub-population shifts. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications.
arXiv Detail & Related papers (2024-05-06T06:47:14Z)
Stable Update of Regression Trees [0.0]
We focus on the stability of an inherently explainable machine learning method, namely regression trees. We propose a regularization method, where data points are weighted based on the uncertainty in the initial model. Results show that the proposed update method improves stability while achieving similar or better predictive performance.
arXiv Detail & Related papers (2024-02-21T09:41:56Z)
Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z)
Continual evaluation for lifelong learning: Identifying the stability gap [35.99653845083381]
We show that a set of common state-of-the-art methods still suffers from substantial forgetting upon starting to learn new tasks. We refer to this intriguing but potentially problematic phenomenon as the stability gap. We establish a framework for continual evaluation that uses per-iteration evaluation and we define a new set of metrics to quantify worst-case performance.
arXiv Detail & Related papers (2022-05-26T15:56:08Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z)
Versatile and Robust Transient Stability Assessment via Instance Transfer Learning [6.760999627905228]
This paper introduces a new data collection method in a data-driven algorithm incorporating the knowledge of power system dynamics. We introduce a new concept called Fault-Affected Area, which provides crucial information regarding the unstable region of operation. The test results on the IEEE 39-bus system verify that this model can accurately predict the stability of previously unseen operational scenarios.
arXiv Detail & Related papers (2021-02-20T09:10:29Z)
Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions. We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel. We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions. The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.