Related papers: CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts

CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts

URL: http://arxiv.org/abs/2507.17651v1
Date: Wed, 23 Jul 2025 16:15:48 GMT
Title: CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Authors: Olaf Dünkel, Artur Jesslen, Jiahao Xie, Christian Theobalt, Christian Rupprecht, Adam Kortylewski,
Abstract summary: We introduce CNS-Bench, a Continuous Nuisance Shift Benchmark to quantify robustness of image classifiers for continuous and realistic nuisance shifts.<n>We propose a filtering mechanism that outperforms previous methods, thereby enabling reliable benchmarking with generative models.
Score: 67.48102304531734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important challenge when using computer vision models in the real world is to evaluate their performance in potential out-of-distribution (OOD) scenarios. While simple synthetic corruptions are commonly applied to test OOD robustness, they often fail to capture nuisance shifts that occur in the real world. Recently, diffusion models have been applied to generate realistic images for benchmarking, but they are restricted to binary nuisance shifts. In this work, we introduce CNS-Bench, a Continuous Nuisance Shift Benchmark to quantify OOD robustness of image classifiers for continuous and realistic generative nuisance shifts. CNS-Bench allows generating a wide range of individual nuisance shifts in continuous severities by applying LoRA adapters to diffusion models. To address failure cases, we propose a filtering mechanism that outperforms previous methods, thereby enabling reliable benchmarking with generative models. With the proposed benchmark, we perform a large-scale study to evaluate the robustness of more than 40 classifiers under various nuisance shifts. Through carefully designed comparisons and analyses, we find that model rankings can change for varying shifts and shift scales, which cannot be captured when applying common binary shifts. Additionally, we show that evaluating the model performance on a continuous scale allows the identification of model failure points, providing a more nuanced understanding of model robustness. Project page including code and data: https://genintel.github.io/CNS.

Related papers

History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials [4.331871667093061]
This study presents an end-to-end learning framework for data-driven modeling of inelastic materials using neural operators.<n>We develop the History-Aware Neural Operator (HANO), an autoregressive model that predicts path-dependent material responses from short segments of recent strain-stress history.<n>We evaluate HANO on two benchmark problems: elastoplasticity with hardening and progressive anisotropic damage in brittle solids.
arXiv Detail & Related papers (2025-06-12T05:19:17Z)
Reliably detecting model failures in deployment without labels [10.006585036887929]
This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring.<n>We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models.<n> Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework.
arXiv Detail & Related papers (2025-06-05T13:56:18Z)
Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS) FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing. It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z)
Can Your Generative Model Detect Out-of-Distribution Covariate Shift? [2.0144831048903566]
We propose a novel method for detecting Out-of-Distribution (OOD) sensory data using conditional Normalizing Flows (cNFs) Our results on CIFAR10 vs. CIFAR10-C and ImageNet200 vs. ImageNet200-C demonstrate the effectiveness of the method.
arXiv Detail & Related papers (2024-09-04T19:27:56Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment [32.01355605506855]
Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training. Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements. We opensource our code and models as a new benchmark for further studying the quantized models.
arXiv Detail & Related papers (2022-04-08T11:19:16Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.