Adversarial Learning for Feature Shift Detection and Correction
- URL: http://arxiv.org/abs/2312.04546v1
- Date: Thu, 7 Dec 2023 18:58:40 GMT
- Title: Adversarial Learning for Feature Shift Detection and Correction
- Authors: Miriam Barrabes, Daniel Mas Montserrat, Margarita Geleta, Xavier
Giro-i-Nieto, Alexander G. Ioannidis
- Abstract summary: Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features.
In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
- Score: 45.65548560695731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data shift is a phenomenon present in many real-world applications, and while
there are multiple methods attempting to detect shifts, the task of localizing
and correcting the features originating such shifts has not been studied in
depth. Feature shifts can occur in many datasets, including in multi-sensor
data, where some sensors are malfunctioning, or in tabular and structured data,
including biomedical, financial, and survey data, where faulty standardization
and data processing pipelines can lead to erroneous features. In this work, we
explore using the principles of adversarial learning, where the information
from several discriminators trained to distinguish between two distributions is
used to both detect the corrupted features and fix them in order to remove the
distribution shift between datasets. We show that mainstream supervised
classifiers, such as random forest or gradient boosting trees, combined with
simple iterative heuristics, can localize and correct feature shifts,
outperforming current statistical and neural network-based techniques. The code
is available at https://github.com/AI-sandbox/DataFix.
Related papers
- Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - Informative regularization for a multi-layer perceptron RR Lyrae
classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z) - A unified framework for dataset shift diagnostics [2.449909275410288]
Supervised learning techniques typically assume training data originates from the target population.
Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors.
We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
arXiv Detail & Related papers (2022-05-17T13:34:45Z) - Transfer Learning for Fault Diagnosis of Transmission Lines [55.971052290285485]
A novel transfer learning framework based on a pre-trained LeNet-5 convolutional neural network is proposed.
It is able to diagnose faults for different transmission line lengths and impedances by transferring the knowledge from a source neural network to predict a dissimilar target dataset.
arXiv Detail & Related papers (2022-01-20T06:36:35Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Graph Neural Network-Based Anomaly Detection in Multivariate Time Series [17.414474298706416]
We develop a new way to detect anomalies in high-dimensional time series data.
Our approach combines a structure learning approach with graph neural networks.
We show that our method detects anomalies more accurately than baseline approaches.
arXiv Detail & Related papers (2021-06-13T09:07:30Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Robust Classification under Class-Dependent Domain Shift [29.54336432319199]
In this paper we explore a special type of dataset shift which we call class-dependent domain shift.
It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution.
arXiv Detail & Related papers (2020-07-10T12:26:57Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.