T2UE: Generating Unlearnable Examples from Text Descriptions
- URL: http://arxiv.org/abs/2508.03091v1
- Date: Tue, 05 Aug 2025 05:10:14 GMT
- Title: T2UE: Generating Unlearnable Examples from Text Descriptions
- Authors: Xingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao, Yu-Gang Jiang,
- Abstract summary: Unlearnable Examples (UEs) have emerged as a promising countermeasure against unauthorized model training.<n>We introduce textbfText-to-Unlearnable Example (T2UE), a novel framework that enables users to generate UEs using only text descriptions.
- Score: 60.111026156038264
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large-scale pre-training frameworks like CLIP have revolutionized multimodal learning, but their reliance on web-scraped datasets, frequently containing private user data, raises serious concerns about misuse. Unlearnable Examples (UEs) have emerged as a promising countermeasure against unauthorized model training, employing carefully crafted unlearnable noise to disrupt the learning of meaningful representations from protected data. Current approaches typically generate UEs by jointly optimizing unlearnable noise for both images and their associated text descriptions (or labels). However, this optimization process is often computationally prohibitive for on-device execution, forcing reliance on external third-party services. This creates a fundamental privacy paradox: users must initially expose their data to these very services to achieve protection, thereby compromising privacy in the process. Such a contradiction has severely hindered the development of practical, scalable data protection solutions. To resolve this paradox, we introduce \textbf{Text-to-Unlearnable Example (T2UE)}, a novel framework that enables users to generate UEs using only text descriptions. T2UE circumvents the need for original image data by employing a text-to-image (T2I) model to map text descriptions into the image (noise) space, combined with an error-minimization framework to produce effective unlearnable noise. Extensive experiments show that T2UE-protected data substantially degrades performance in downstream tasks (e.g., cross-modal retrieval) for state-of-the-art models. Notably, the protective effect generalizes across diverse architectures and even to supervised learning settings. Our work demonstrates the feasibility of "zero-contact data protection", where personal data can be safeguarded based solely on their textual descriptions, eliminating the need for direct data exposure.
Related papers
- Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking [90.81846867441993]
This paper presents the first investigation on preventing personal video data from unauthorized exploitation by deep trackers.<n>We propose a novel generative framework for generating Temporal Unlearnable Examples (TUEs)<n>Our approach achieves state-of-the-art performance in video data-privacy protection, with strong transferability across VOT models, datasets, and temporal matching tasks.
arXiv Detail & Related papers (2025-07-10T07:11:33Z) - Towards Operationalizing Right to Data Protection [8.61230665736263]
RegText is a framework that injects imperceptible correlations into natural language datasets effectively rendering them unlearnable without affecting content.
We demonstrate RegText's utility through rigorous empirical analysis of small and large LMs.
RegText can newer models like GPT-4o and Llama from learning on our generated data, resulting in a drop in their test accuracy compared to their zero-shot performance.
arXiv Detail & Related papers (2024-11-13T10:43:31Z) - Detecting Dataset Abuse in Fine-Tuning Stable Diffusion Models for Text-to-Image Synthesis [3.8809673918404246]
dataset watermarking framework designed to detect unauthorized usage and trace data leaks.
We present a dataset watermarking framework designed to detect unauthorized usage and trace data leaks.
arXiv Detail & Related papers (2024-09-27T16:34:48Z) - Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning [53.766434746801366]
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet.
Hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information.
Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection.
We propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples.
arXiv Detail & Related papers (2024-07-23T09:00:52Z) - Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification.
We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z) - CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following [27.22804560751958]
We propose a collaborative generation framework integrating large (hosted on cloud infrastructure) and small models (deployed on local devices) to address privacy concerns logically.
Our experimental findings, based on our synthesized dataset and two additional open-source datasets, indicate that: 1) Large-scale models perform well when provided with user context but struggle in the absence of such context.
Our framework, utilizing mixed-scale models, showcases competitive performance, providing a feasible solution to privacy issues.
arXiv Detail & Related papers (2024-03-05T17:15:28Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.