Related papers: Mitigating representation bias caused by missing pixels in methane plume detection

Mitigating representation bias caused by missing pixels in methane plume detection

URL: http://arxiv.org/abs/2510.19478v1
Date: Wed, 22 Oct 2025 11:15:31 GMT
Title: Mitigating representation bias caused by missing pixels in methane plume detection
Authors: Julia Wąsala, Joannes D. Maasakkers, Ilse Aben, Rochelle Schneider, Holger Hoos, Mitra Baratchi,
Abstract summary: Missing pixels can lead to representation bias in automated feature extraction models.<n>We show that spurious association between the label and the number of missing values in methane plume detection can cause the model to associate the coverage.
Score: 0.7138165941237656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most satellite images have systematically missing pixels (i.e., missing data not at random (MNAR)) due to factors such as clouds. If not addressed, these missing pixels can lead to representation bias in automated feature extraction models. In this work, we show that spurious association between the label and the number of missing values in methane plume detection can cause the model to associate the coverage (i.e., the percentage of valid pixels in an image) with the label, subsequently under-detecting plumes in low-coverage images. We evaluate multiple imputation approaches to remove the dependence between the coverage and a label. Additionally, we propose a weighted resampling scheme during training that removes the association between the label and the coverage by enforcing class balance in each coverage bin. Our results show that both resampling and imputation can significantly reduce the representation bias without hurting balanced accuracy, precision, or recall. Finally, we evaluate the capability of the debiased models using these techniques in an operational scenario and demonstrate that the debiased models have a higher chance of detecting plumes in low-coverage images.

Related papers

Bayesian generative models can flag performance loss, bias, and out-of-distribution image content [15.835055687646507]
Generative models are popular for medical imaging tasks such as anomaly detection, feature extraction, data visualization, or image generation.<n>Since they are parameterized by deep learning models, they are often sensitive to distribution shifts and unreliable when applied to out-of-distribution data.<n>We show how pixel-wise uncertainty can detect out-of-distribution image content such as ink, rulers, and patches.
arXiv Detail & Related papers (2025-03-21T18:45:28Z)
Learning to Rank Patches for Unbiased Image Redundancy Reduction [80.93989115541966]
Images suffer from heavy spatial redundancy because pixels in neighboring regions are spatially correlated. Existing approaches strive to overcome this limitation by reducing less meaningful image regions. We propose a self-supervised framework for image redundancy reduction called Learning to Rank Patches.
arXiv Detail & Related papers (2024-03-31T13:12:41Z)
DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models [12.42597979026873]
We propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy.
arXiv Detail & Related papers (2023-12-18T09:40:38Z)
N-pad : Neighboring Pixel-based Industrial Anomaly Detection [0.0]
We present textittextbfN-pad, a novel method for anomaly detection and segmentation in a one-class learning setting. We have achieved state-of-the-art performance in MVTec-AD with AUROC of 99.37 for anomaly detection and 98.75 for anomaly segmentation.
arXiv Detail & Related papers (2022-10-17T06:22:16Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting. We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
Fully Convolutional Cross-Scale-Flows for Image-based Defect Detection [24.0966076588569]
We tackle the problem of automatic defect detection without requiring any image samples of defective parts. We propose a novel fully convolutional cross-scale normalizing flow (CS-Flow) that jointly processes multiple feature maps of different scales. Our work sets a new state-of-the-art in image-level defect detection on the benchmark datasets Magnetic Tile Defects and MVTec AD showing a 100% AUROC on 4 out of 15 classes.
arXiv Detail & Related papers (2021-10-06T15:35:13Z)
A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z)
Anomaly detection through latent space restoration using vector-quantized variational autoencoders [0.8122270502556374]
We propose an out-of-distribution detection method using density and restoration-based approaches. The VQ-VAE model learns to encode images in a categorical latent space. The prior distribution of latent codes is then modelled using an Auto-Regressive (AR) model.
arXiv Detail & Related papers (2020-12-12T09:19:59Z)
Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z)
Improved Slice-wise Tumour Detection in Brain MRIs by Computing Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods. We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder. We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.