Related papers: The Easy Path to Robustness: Coreset Selection using Sample Hardness

The Easy Path to Robustness: Coreset Selection using Sample Hardness

URL: http://arxiv.org/abs/2510.11018v1
Date: Mon, 13 Oct 2025 05:28:16 GMT
Title: The Easy Path to Robustness: Coreset Selection using Sample Hardness
Authors: Pranav Ramesh, Arjun Roy, Deepak Ravikumar, Kaushik Roy, Gopalakrishnan Srinivasan,
Abstract summary: We propose a framework linking a sample's adversarial vulnerability to its textithardness, which we quantify using the average input gradient norm (AIGN) over training.<n>We present EasyCore, a coreset selection algorithm that retains only the samples with low AIGN for training.<n>We empirically show that models trained on EasyCore-selected data achieve significantly higher adversarial accuracy than those trained with competing coreset methods.
Score: 12.378609890122945
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing adversarially robust models from a data-centric perspective requires understanding which input samples are most crucial for learning resilient features. While coreset selection provides a mechanism for efficient training on data subsets, current algorithms are designed for clean accuracy and fall short in preserving robustness. To address this, we propose a framework linking a sample's adversarial vulnerability to its \textit{hardness}, which we quantify using the average input gradient norm (AIGN) over training. We demonstrate that \textit{easy} samples (with low AIGN) are less vulnerable and occupy regions further from the decision boundary. Leveraging this insight, we present EasyCore, a coreset selection algorithm that retains only the samples with low AIGN for training. We empirically show that models trained on EasyCore-selected data achieve significantly higher adversarial accuracy than those trained with competing coreset methods under both standard and adversarial training. As AIGN is a model-agnostic dataset property, EasyCore is an efficient and widely applicable data-centric method for improving adversarial robustness. We show that EasyCore achieves up to 7\% and 5\% improvement in adversarial accuracy under standard training and TRADES adversarial training, respectively, compared to existing coreset methods.

Related papers

SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection [9.129619927191973]
SubZeroCore is a training-free coreset selection method that integrates submodular coverage and density into a single, unified objective.<n>We show that SubZeroCore matches training-based baselines and significantly outperforms them at high pruning rates, while dramatically reducing computational overhead.
arXiv Detail & Related papers (2025-09-26T01:26:45Z)
Improving the Efficiency of Self-Supervised Adversarial Training through Latent Clustering-Based Selection [2.7554677967598047]
adversarially robust learning is widely recognized to demand significantly more training examples.<n>Recent works propose the use of self-supervised adversarial training with external or synthetically generated unlabeled data to enhance model robustness.<n>We propose novel methods to strategically select a small subset of unlabeled data essential for SSAT and robustness improvement.
arXiv Detail & Related papers (2025-01-15T15:47:49Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Robust Few-shot Learning Without Using any Adversarial Samples [19.34427461937382]
A few efforts have been made to combine the few-shot problem with the robustness objective using sophisticated Meta-Learning techniques. We propose a simple but effective alternative that does not require any adversarial samples. Inspired by the cognitive decision-making process in humans, we enforce high-level feature matching between the base class data and their corresponding low-frequency samples.
arXiv Detail & Related papers (2022-11-03T05:58:26Z)
DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples. Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample. We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z)
Adversarial Coreset Selection for Efficient Robust Training [11.510009152620666]
We show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times.
arXiv Detail & Related papers (2022-09-13T07:37:53Z)
Adversarial Unlearning: Reducing Confidence Along Adversarial Directions [88.46039795134993]
We propose a complementary regularization strategy that reduces confidence on self-generated examples. The method, which we call RCAD, aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss. Despite its simplicity, we find on many classification benchmarks that RCAD can be added to existing techniques to increase test accuracy by 1-3% in absolute value.
arXiv Detail & Related papers (2022-06-03T02:26:24Z)
Active Learning for Deep Visual Tracking [51.5063680734122]
Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model. Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost.
arXiv Detail & Related papers (2021-10-17T11:47:56Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting. We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)
Extending Contrastive Learning to Unsupervised Coreset Selection [26.966136750754732]
We propose an unsupervised way of selecting a core-set entirely unlabeled. We use two leading methods for contrastive learning. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation.
arXiv Detail & Related papers (2021-03-05T10:21:51Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.