Handling Out-of-Distribution Data: A Survey
- URL: http://arxiv.org/abs/2507.21160v1
- Date: Fri, 25 Jul 2025 04:29:49 GMT
- Title: Handling Out-of-Distribution Data: A Survey
- Authors: Lakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal,
- Abstract summary: This paper outlines different mechanisms for handling two main types of distribution shifts.<n>We provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.
- Score: 2.373572816573706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.
Related papers
- Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training.<n>We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches.<n>We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z) - Transfer Learning for High-dimensional Quantile Regression with Distribution Shift [0.28927500190704564]
This paper focuses on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift.<n>We propose a novel transferable set and a new transfer framework to address the above three discrepancies.<n>Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method.
arXiv Detail & Related papers (2024-11-29T18:49:55Z) - A Survey of Deep Graph Learning under Distribution Shifts: from Graph Out-of-Distribution Generalization to Adaptation [59.14165404728197]
We provide an up-to-date and forward-looking review of deep graph learning under distribution shifts.<n>Specifically, we cover three primary scenarios: graph OOD generalization, training-time graph OOD adaptation, and test-time graph OOD adaptation.<n>To provide a better understanding of the literature, we introduce a systematic taxonomy that classifies existing methods into model-centric and data-centric approaches.
arXiv Detail & Related papers (2024-10-25T02:39:56Z) - Proxy Methods for Domain Adaptation [78.03254010884783]
proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables.
We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.
arXiv Detail & Related papers (2024-03-12T09:32:41Z) - Supervised Algorithmic Fairness in Distribution Shifts: A Survey [17.826312801085052]
In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may shift.
This shift can lead to unfair predictions, disproportionately affecting certain groups characterized by sensitive attributes, such as race and gender.
arXiv Detail & Related papers (2024-02-02T11:26:18Z) - ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift [7.256448072529497]
Knowledge Distillation (KD) transfers knowledge from large models to small models and has recently achieved remarkable success.<n>However, the reliability of existing KD methods in real-world applications, especially under distribution shift, remains underexplored.<n>We propose a unified and systematic framework textscShiftKD to benchmark KD against two general distributional shifts.
arXiv Detail & Related papers (2023-12-25T10:43:31Z) - Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations [3.8980564330208662]
We propose and analyze a model that assumes that the attribute responsible for the shift is unknown in advance.<n>We show that our algorithm improves generalization to diverse class distributions in both simulations and experiments on real-world datasets.
arXiv Detail & Related papers (2023-11-30T14:14:31Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Robust Generalization despite Distribution Shift via Minimum
Discriminating Information [46.164498176119665]
We introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution.
We employ the principle of minimum discriminating information to embed the available prior knowledge.
We obtain explicit generalization bounds with respect to the unknown shifted distribution.
arXiv Detail & Related papers (2021-06-08T15:25:35Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.