Adversarial Learning for Feature Shift Detection and Correction
        - URL: http://arxiv.org/abs/2312.04546v1
- Date: Thu, 7 Dec 2023 18:58:40 GMT
- Title: Adversarial Learning for Feature Shift Detection and Correction
- Authors: Miriam Barrabes, Daniel Mas Montserrat, Margarita Geleta, Xavier
  Giro-i-Nieto, Alexander G. Ioannidis
- Abstract summary: Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features.
In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
- Score: 45.65548560695731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Data shift is a phenomenon present in many real-world applications, and while
there are multiple methods attempting to detect shifts, the task of localizing
and correcting the features originating such shifts has not been studied in
depth. Feature shifts can occur in many datasets, including in multi-sensor
data, where some sensors are malfunctioning, or in tabular and structured data,
including biomedical, financial, and survey data, where faulty standardization
and data processing pipelines can lead to erroneous features. In this work, we
explore using the principles of adversarial learning, where the information
from several discriminators trained to distinguish between two distributions is
used to both detect the corrupted features and fix them in order to remove the
distribution shift between datasets. We show that mainstream supervised
classifiers, such as random forest or gradient boosting trees, combined with
simple iterative heuristics, can localize and correct feature shifts,
outperforming current statistical and neural network-based techniques. The code
is available at https://github.com/AI-sandbox/DataFix.
 
      
        Related papers
        - Feature Shift Localization Network [51.33484517421393]
 We introduce a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner.<n>The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts without the need for re-training.
 arXiv  Detail & Related papers  (2025-06-10T15:27:32Z)
- A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
 Existing datasets allow evaluation of only knowns or unknowns - but not both.
We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.
The dataset is twice larger than existing anomaly segmentation datasets.
 arXiv  Detail & Related papers  (2025-03-28T10:31:01Z)
- Automatic dataset shift identification to support root cause analysis of   AI performance drift [13.996602963045387]
 Shifts in data distribution can substantially harm the performance of clinical AI models.
We propose the first unsupervised dataset shift identification framework.
We report promising results for the proposed framework on five types of real-world dataset shifts.
 arXiv  Detail & Related papers  (2024-11-12T17:09:20Z)
- Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
 Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
 arXiv  Detail & Related papers  (2023-10-06T20:11:27Z)
- Informative regularization for a multi-layer perceptron RR Lyrae
  classifier under data shift [3.303002683812084]
 We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
 arXiv  Detail & Related papers  (2023-03-12T02:49:19Z)
- A unified framework for dataset shift diagnostics [2.449909275410288]
 Supervised learning techniques typically assume training data originates from the target population.
Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors.
We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
 arXiv  Detail & Related papers  (2022-05-17T13:34:45Z)
- Transfer Learning for Fault Diagnosis of Transmission Lines [55.971052290285485]
 A novel transfer learning framework based on a pre-trained LeNet-5 convolutional neural network is proposed.
It is able to diagnose faults for different transmission line lengths and impedances by transferring the knowledge from a source neural network to predict a dissimilar target dataset.
 arXiv  Detail & Related papers  (2022-01-20T06:36:35Z)
- Convolutional generative adversarial imputation networks for
  spatio-temporal missing data in storm surge simulations [86.5302150777089]
 Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
 arXiv  Detail & Related papers  (2021-11-03T03:50:48Z)
- Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
 Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
 arXiv  Detail & Related papers  (2020-10-10T14:04:44Z)
- Robust Classification under Class-Dependent Domain Shift [29.54336432319199]
 In this paper we explore a special type of dataset shift which we call class-dependent domain shift.
It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution.
 arXiv  Detail & Related papers  (2020-07-10T12:26:57Z)
- Learning What Makes a Difference from Counterfactual Examples and
  Gradient Supervision [57.14468881854616]
 We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
 arXiv  Detail & Related papers  (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.