Related papers: Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

URL: http://arxiv.org/abs/2401.11035v1
Date: Fri, 19 Jan 2024 21:38:18 GMT
Title: Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually
Authors: Mazal Bethany, Brandon Wherry, Nishant Vishwamitra, Peyman Najafirad
Abstract summary: Social media platforms are increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity. Major platforms use artificial intelligence (AI) and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided.
Score: 3.69611312621848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use artificial intelligence (AI) and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM

Related papers

SafeText: Safe Text-to-image Models via Aligning the Text Encoder [38.14026164194725]
Text-to-image models can generate harmful images when presented with unsafe prompts. We propose SafeText, a novel alignment method that fine-tunes the text encoder rather than the diffusion module. Our results show that SafeText effectively prevents harmful image generation with minor impact on the images for safe prompts.
arXiv Detail & Related papers (2025-02-28T01:02:57Z)
Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models [4.5656369638728656]
Distorting Embedding Space (DES) is a text encoder-based defense mechanism. DES transforms unsafe embeddings, extracted from a text encoder using unsafe prompts, toward carefully calculated safe embedding regions. DES also neutralizes the nudity embedding, extracted using prompt nudity", by aligning it with neutral embedding to enhance robustness against adversarial attacks.
arXiv Detail & Related papers (2025-01-31T04:14:05Z)
MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content. It is crucial to identify such unsafe images based on established safety rules. Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z)
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [49.60774626839712]
Training multimodal generative models can expose users to harmful, unsafe and controversial or culturally-inappropriate outputs. We propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process to generate safer images. We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety.
arXiv Detail & Related papers (2024-11-21T09:47:13Z)
Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification. We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z)
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models [42.19184265811366]
We introduce a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences.
arXiv Detail & Related papers (2023-11-27T19:02:17Z)
Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples [26.026171363346975]
Cloud-based image related services such as classification have become crucial. In this study, we propose a novel privacypreserving image classification scheme. encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key.
arXiv Detail & Related papers (2023-10-19T13:01:58Z)
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution [21.93748586123046]
We develop and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant NSFW images. Our framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules. Results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts.
arXiv Detail & Related papers (2023-09-25T13:20:15Z)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts [63.61248884015162]
Text-to-image diffusion models have shown remarkable ability in high-quality content generation. This work proposes Prompting4 Debugging (P4D) as a tool that automatically finds problematic prompts for diffusion models. Our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms.
arXiv Detail & Related papers (2023-09-12T11:19:36Z)
PRO-Face S: Privacy-preserving Reversible Obfuscation of Face Images via Secure Flow [69.78820726573935]
We name it PRO-Face S, short for Privacy-preserving Reversible Obfuscation of Face images via Secure flow-based model. In the framework, an Invertible Neural Network (INN) is utilized to process the input image along with its pre-obfuscated form, and generate the privacy protected image that visually approximates to the pre-obfuscated one.
arXiv Detail & Related papers (2023-07-18T10:55:54Z)
Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable'' We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z)
Benchmarking Robustness to Adversarial Image Obfuscations [22.784762155781436]
Malicious actors may obfuscate policy violating images to prevent machine learning models from reaching the correct decision. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors.
arXiv Detail & Related papers (2023-01-30T15:36:44Z)
Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection [24.50702655120905]
This paper demonstrates that the current digital watermarking framework is insufficient to protect image captioning tasks. As a remedy, this paper studies and proposes two different embedding schemes in the hidden memory state of a recurrent neural network. To the best of our knowledge, this work is the first to propose ownership protection on image captioning task.
arXiv Detail & Related papers (2020-08-25T13:48:35Z)
InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.