Related papers: Subtle Inverse Crimes: Na\"ively training machine learning algorithms could lead to overly-optimistic results

Subtle Inverse Crimes: Na\"ively training machine learning algorithms could lead to overly-optimistic results

URL: http://arxiv.org/abs/2109.08237v1
Date: Thu, 16 Sep 2021 22:00:15 GMT
Title: Subtle Inverse Crimes: Na\"ively training machine learning algorithms could lead to overly-optimistic results
Authors: Efrat Shimron, Jonathan I. Tamir, Ke Wang, Michael Lustig
Abstract summary: This work aims to highlight that in some cases, this common practice may lead to biased, overly-optimistic results. We describe two preprocessing pipelines typical of open-access databases and study their effects on three well-established algorithms. Our results demonstrate that the CS, DictL and DL algorithms yield systematically biased results when na"ively trained on seemingly-appropriate data.
Score: 5.785136336372809
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While open databases are an important resource in the Deep Learning (DL) era, they are sometimes used "off-label": data published for one task are used for training algorithms for a different one. This work aims to highlight that in some cases, this common practice may lead to biased, overly-optimistic results. We demonstrate this phenomenon for inverse problem solvers and show how their biased performance stems from hidden data preprocessing pipelines. We describe two preprocessing pipelines typical of open-access databases and study their effects on three well-established algorithms developed for Magnetic Resonance Imaging (MRI) reconstruction: Compressed Sensing (CS), Dictionary Learning (DictL), and DL. In this large-scale study we performed extensive computations. Our results demonstrate that the CS, DictL and DL algorithms yield systematically biased results when na\"ively trained on seemingly-appropriate data: the Normalized Root Mean Square Error (NRMSE) improves consistently with the preprocessing extent, showing an artificial increase of 25%-48% in some cases. Since this phenomenon is generally unknown, biased results are sometimes published as state-of-the-art; we refer to that as subtle inverse crimes. This work hence raises a red flag regarding na\"ive off-label usage of Big Data and reveals the vulnerability of modern inverse problem solvers to the resulting bias.

Related papers

PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks [64.90981115460937]
This paper explores inference-time data leakage risks of deep neural networks (NNs) We propose a novel backward feature inversion method, textbfPEEL, which can effectively recover block-wise input features from the intermediate output of residual NNs. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE)
arXiv Detail & Related papers (2025-04-08T20:11:05Z)
Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z)
Debiasing Machine Unlearning with Counterfactual Examples [31.931056076782202]
We analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. We introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Our method outperforms existing machine unlearning baselines on evaluation metrics.
arXiv Detail & Related papers (2024-04-24T09:33:10Z)
A Systematic Study on Quantifying Bias in GAN-Augmented Data [0.0]
Generative adversarial networks (GANs) have recently become a popular data augmentation technique used by machine learning practitioners. They have been shown to suffer from the so-called mode collapse failure mode, which makes them vulnerable to exacerbating biases on already skewed datasets. This study is a systematic effort focused on the evaluation of state-of-the-art metrics that can potentially quantify biases in GAN-augmented data.
arXiv Detail & Related papers (2023-08-23T22:19:48Z)
On Counterfactual Data Augmentation Under Confounding [30.76982059341284]
Counterfactual data augmentation has emerged as a method to mitigate confounding biases in the training data. These biases arise due to various observed and unobserved confounding variables in the data generation process. We show how our simple augmentation method helps existing state-of-the-art methods achieve good results.
arXiv Detail & Related papers (2023-05-29T16:20:23Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition [107.58227666024791]
Face recognition systems are widely deployed in safety-critical applications, including law enforcement. They exhibit bias across a range of socio-demographic dimensions, such as gender and race. Previous works on bias mitigation largely focused on pre-processing the training data.
arXiv Detail & Related papers (2022-10-18T15:46:05Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Deep Visual Anomaly detection with Negative Learning [18.79849041106952]
In this paper, we propose anomaly detection with negative learning (ADNL), which employs the negative learning concept for the enhancement of anomaly detection. The idea is to limit the reconstruction capability of a generative model using the given a small amount of anomaly examples. This way, the network not only learns to reconstruct normal data but also encloses the normal distribution far from the possible distribution of anomalies.
arXiv Detail & Related papers (2021-05-24T01:48:44Z)
Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance. We specifically propose to decompose confidence scoring as well as a modified input pre-processing method. Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.