Towards Provably Unlearnable Examples via Bayes Error Optimisation
- URL: http://arxiv.org/abs/2511.08191v1
- Date: Wed, 12 Nov 2025 01:45:46 GMT
- Title: Towards Provably Unlearnable Examples via Bayes Error Optimisation
- Authors: Ruihan Zhang, Jun Sun, Ee-Peng Lim, Peixin Zhang,
- Abstract summary: We propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error.<n>Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples.
- Score: 14.262882776897372
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.
Related papers
- Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information [55.75102049412629]
We show that effective unlearnable examples always decrease mutual information between clean features and poisoned features.<n>We propose a novel unlearnable method called Mutual Information Unlearnable Examples (MI-UE)<n>Our approach significantly outperforms the previous methods, even under defense mechanisms.
arXiv Detail & Related papers (2026-03-04T04:53:29Z) - Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers [7.1709130026195895]
Un unlearnable examples introduce imperceptible perturbations into data, preventing models from learning effectively.<n>We propose Perturbation-Induced Linearization (PIL), a method that generates perturbations using only linear surrogate models.<n>PIL achieves comparable or better performance than existing surrogate-based methods while reducing computational time dramatically.
arXiv Detail & Related papers (2026-01-27T17:26:41Z) - How Far Are We from True Unlearnability? [8.176905459241047]
Several unlearnable methods have been proposed, which generate unlearnable examples (UEs) by compromising the training availability of data.<n>We investigate how far are we from attaining truly unlearnable examples?<n>We propose an Unlearnable Distance (UD) to measure the unlearnability of data based on the SAL distribution of parameters in clean and poisoned models.
arXiv Detail & Related papers (2025-09-09T18:01:10Z) - Statistically Testing Training Data for Unwanted Error Patterns using Rule-Oriented Regression [0.5831737970661137]
We provide a method to test training data for flaws, to establish a trustworthy ground-truth for a subsequent training of machine learning models.<n>Our approach extends the abilities of conventional statistical testing by letting the test-condition'' be any condition to describe a pattern in the data.<n>We provide an open source implementation for demonstration and experiments.
arXiv Detail & Related papers (2025-03-24T09:52:36Z) - Ask Your Distribution Shift if Pre-Training is Right for You [67.90850628695563]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.<n>We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.<n>Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Neural Active Learning on Heteroskedastic Distributions [29.01776999862397]
We demonstrate the catastrophic failure of active learning algorithms on heteroskedastic datasets.
We propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples.
arXiv Detail & Related papers (2022-11-02T07:30:19Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training.
ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.