Related papers: Mitigating Clever Hans Strategies in Image Classifiers through Generating Counterexamples

Mitigating Clever Hans Strategies in Image Classifiers through Generating Counterexamples

URL: http://arxiv.org/abs/2510.17524v1
Date: Mon, 20 Oct 2025 13:22:57 GMT
Title: Mitigating Clever Hans Strategies in Image Classifiers through Generating Counterexamples
Authors: Sidney Bender, Ole Delzer, Jan Herrmann, Heike Antje Marxfeld, Klaus-Robert Müller, Grégoire Montavon,
Abstract summary: Group distributional robustness methods rely on explicit group labels to upweight underrepresented groups.<n>We propose Counterfactual Knowledge Distillation (CFKD), a framework that generates diverse counterfactuals.<n>We demonstrate CFKD's efficacy across five datasets, spanning synthetic tasks to an industrial application.
Score: 15.618934546058277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning models remain vulnerable to spurious correlations, leading to so-called Clever Hans predictors that undermine robustness even in large-scale foundation and self-supervised models. Group distributional robustness methods, such as Deep Feature Reweighting (DFR) rely on explicit group labels to upweight underrepresented subgroups, but face key limitations: (1) group labels are often unavailable, (2) low within-group sample sizes hinder coverage of the subgroup distribution, and (3) performance degrades sharply when multiple spurious correlations fragment the data into even smaller groups. We propose Counterfactual Knowledge Distillation (CFKD), a framework that sidesteps these issues by generating diverse counterfactuals, enabling a human annotator to efficiently explore and correct the model's decision boundaries through a knowledge distillation step. Unlike DFR, our method not only reweights the undersampled groups, but it also enriches them with new data points. Our method does not require any confounder labels, achieves effective scaling to multiple confounders, and yields balanced generalization across groups. We demonstrate CFKD's efficacy across five datasets, spanning synthetic tasks to an industrial application, with particularly strong gains in low-data regimes with pronounced spurious correlations. Additionally, we provide an ablation study on the effect of the chosen counterfactual explainer and teacher model, highlighting their impact on robustness.

Related papers

C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale Recommendation Systems [7.548682352355034]
We show how the attention mechanism can play a key role in factorization machines for shared embedding selection.<n>We propose to address this challenge by analyzing the substructures in the dataset and exposing those with strong distributional contrast through auxiliary learning.<n>This approach customizes the learning process of attention layers to preserve mutual information with minority cohorts while improving global performance.
arXiv Detail & Related papers (2025-10-02T17:00:17Z)
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [61.45587642780908]
We propose a three-step approach for parameter-efficient fine-tuning of image-text foundation models.<n>Our method improves its two key components: minority samples identification and the robust training algorithm.<n>Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error.
arXiv Detail & Related papers (2025-03-12T15:46:12Z)
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions [37.0753553356624]
We introduce Group-robust Sample Reweighting (GSR), a two-stage approach that first learns the representations from group-unlabeled data.<n>GSR is theoretically sound, practically lightweight, and effective in improving the robustness to subpopulation shifts.
arXiv Detail & Related papers (2025-03-10T13:34:18Z)
Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation [3.894771553698554]
Empirical Risk Minimization (ERM) models tend to rely on attributes that have high spurious correlation with the target. This can degrade the performance on underrepresented (or'minority') groups that lack these attributes. We propose Environment-based Validation and Loss-based Sampling (EVaLS) to enhance robustness to spurious correlation.
arXiv Detail & Related papers (2024-10-07T08:17:44Z)
Efficient Bias Mitigation Without Privileged Information [14.21628601482357]
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups. Existing bias mitigation methods that aim to address this issue often rely on group labels for training or validation. We propose Targeted Augmentations for Bias Mitigation (TAB), a framework that leverages the entire training history of a helper model to identify spurious samples. We show that TAB improves worst-group performance without any group information or model selection, outperforming existing methods while maintaining overall accuracy.
arXiv Detail & Related papers (2024-09-26T09:56:13Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups. We reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z)
Outlier-Robust Group Inference via Gradient Space Clustering [50.87474101594732]
Existing methods can improve the worst-group performance, but they require group annotations, which are often expensive and sometimes infeasible to obtain. We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters. We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN.
arXiv Detail & Related papers (2022-10-13T06:04:43Z)
Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts. We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model. Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z)
Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions. We propose an algorithm that optimize for the worst-off group assignments from a constraint set. We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.