Related papers: Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

URL: http://arxiv.org/abs/2410.05664v2
Date: Sun, 09 Mar 2025 05:17:36 GMT
Title: Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Authors: Saemi Moon, Minjong Lee, Sangdon Park, Dongwoo Kim,
Abstract summary: Concept unlearning is a promising solution to unethical or harmful use of text-to-image diffusion models.<n>Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories: Celebrity, Style, Intellectual Property, and NSFW.<n>Our investigation reveals that no single method excels across all evaluation criteria.
Score: 8.831339626121848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations primarily focus on whether target concepts are removed while preserving image quality, neglecting the broader impacts such as unintended side effects. In this work, we propose Holistic Unlearning Benchmark (HUB), a comprehensive framework for evaluating unlearning methods across six key dimensions: faithfulness, alignment, pinpoint-ness, multilingual robustness, attack robustness, and efficiency. Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories: Celebrity, Style, Intellectual Property, and NSFW. Our investigation reveals that no single method excels across all evaluation criteria. By releasing our evaluation code and dataset, we hope to inspire further research in this area, leading to more reliable and effective unlearning methods.

Related papers

Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective [42.003102851493885]
We propose RULI (Rectified Unlearning Evaluation Framework via Likelihood Inference) to address critical gaps in the evaluation of inexact unlearning methods.<n>RULI introduces a dual-objective attack to measure both unlearning efficacy and privacy risks at a per-sample granularity.<n>Our findings reveal significant vulnerabilities in state-of-the-art unlearning methods, exposing privacy risks underestimated by existing methods.
arXiv Detail & Related papers (2025-06-16T00:30:02Z)
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models [27.338242898495448]
Multimodal large language models (MLLMs) have achieved remarkable success in vision-language tasks.<n>Their reliance on vast, internet-sourced data raises significant privacy and security concerns.<n>Machine unlearning (MU) has emerged as a critical technique to address these issues.<n>PEBench is a novel benchmark designed to facilitate a thorough assessment of MU in MLLMs.
arXiv Detail & Related papers (2025-03-16T15:26:20Z)
Comprehensive Assessment and Analysis for NSFW Content Erasure in Text-to-Image Diffusion Models [16.60455968933097]
Concept erasure methods can inadvertently generate NSFW content even with efforts on filtering NSFW content from the training dataset. We present the first systematic investigation of concept erasure methods for NSFW content and its sub-themes in text-to-image diffusion models. We provide a holistic evaluation of 11 state-of-the-art baseline methods with 14 variants.
arXiv Detail & Related papers (2025-02-18T04:25:42Z)
EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques [20.2544260436998]
Concept erasure techniques can remove unwanted concepts from text-to-image models. We systematically investigate the failure modes of current concept erasure techniques. We introduce EraseBENCH, a benchmark designed to assess concept erasure methods with greater depth. Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment.
arXiv Detail & Related papers (2025-01-16T20:42:17Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [92.99416966226724]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms. We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
Towards Understanding the Feasibility of Machine Unlearning [14.177012256360635]
We present a set of novel metrics for quantifying the difficulty of unlearning. Specifically, we propose several metrics to assess the conditions necessary for a successful unlearning operation. We also present a ranking mechanism to identify the most challenging samples to unlearn.
arXiv Detail & Related papers (2024-10-03T23:41:42Z)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models [19.015202590038996]
We design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack unlearned models. We propose Latent Adrial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. We demonstrate that LAU improves unlearning effectiveness by over $53.5%$, cause only less than a $11.6%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.
arXiv Detail & Related papers (2024-08-20T09:36:04Z)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models [42.734578139757886]
Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning.
arXiv Detail & Related papers (2024-05-24T05:47:23Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Holistic Evaluation of Text-To-Image Models [153.47415461488097]
We introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM) We identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths.
arXiv Detail & Related papers (2023-11-07T19:00:56Z)
UMat: Uncertainty-Aware Single Image High Resolution Material Capture [2.416160525187799]
We propose a learning-based method to recover normals, specularity, and roughness from a single diffuse image of a material. Our method is the first one to deal with the problem of modeling uncertainty in material digitization.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.