Image Safeguarding: Reasoning with Conditional Vision Language Model and
Obfuscating Unsafe Content Counterfactually
- URL: http://arxiv.org/abs/2401.11035v1
- Date: Fri, 19 Jan 2024 21:38:18 GMT
- Title: Image Safeguarding: Reasoning with Conditional Vision Language Model and
Obfuscating Unsafe Content Counterfactually
- Authors: Mazal Bethany, Brandon Wherry, Nishant Vishwamitra, Peyman Najafirad
- Abstract summary: Social media platforms are increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity.
Major platforms use artificial intelligence (AI) and human moderation to obfuscate such images to make them safer.
Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided.
- Score: 3.69611312621848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media platforms are being increasingly used by malicious actors to
share unsafe content, such as images depicting sexual activity, cyberbullying,
and self-harm. Consequently, major platforms use artificial intelligence (AI)
and human moderation to obfuscate such images to make them safer. Two critical
needs for obfuscating unsafe images is that an accurate rationale for
obfuscating image regions must be provided, and the sensitive regions should be
obfuscated (\textit{e.g.} blurring) for users' safety. This process involves
addressing two key problems: (1) the reason for obfuscating unsafe images
demands the platform to provide an accurate rationale that must be grounded in
unsafe image-specific attributes, and (2) the unsafe regions in the image must
be minimally obfuscated while still depicting the safe regions. In this work,
we address these key issues by first performing visual reasoning by designing a
visual reasoning model (VLM) conditioned on pre-trained unsafe image
classifiers to provide an accurate rationale grounded in unsafe image
attributes, and then proposing a counterfactual explanation algorithm that
minimally identifies and obfuscates unsafe regions for safe viewing, by first
utilizing an unsafe image classifier attribution matrix to guide segmentation
for a more optimal subregion segmentation followed by an informed greedy search
to determine the minimum number of subregions required to modify the
classifier's output based on attribution score. Extensive experiments on
uncurated data from social networks emphasize the efficacy of our proposed
method. We make our code available at:
https://github.com/SecureAIAutonomyLab/ConditionalVLM
Related papers
- Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [49.60774626839712]
Training multimodal generative models can expose users to harmful, unsafe and controversial or culturally-inappropriate outputs.
We propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process to generate safer images.
We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety.
arXiv Detail & Related papers (2024-11-21T09:47:13Z) - Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification.
We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z) - Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models [42.19184265811366]
We introduce a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs.
We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences.
arXiv Detail & Related papers (2023-11-27T19:02:17Z) - Recoverable Privacy-Preserving Image Classification through Noise-like
Adversarial Examples [26.026171363346975]
Cloud-based image related services such as classification have become crucial.
In this study, we propose a novel privacypreserving image classification scheme.
encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key.
arXiv Detail & Related papers (2023-10-19T13:01:58Z) - SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution [21.93748586123046]
We develop and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant NSFW images.
Our framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules.
Results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts.
arXiv Detail & Related papers (2023-09-25T13:20:15Z) - Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts [63.61248884015162]
Text-to-image diffusion models have shown remarkable ability in high-quality content generation.
This work proposes Prompting4 Debugging (P4D) as a tool that automatically finds problematic prompts for diffusion models.
Our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms.
arXiv Detail & Related papers (2023-09-12T11:19:36Z) - PRO-Face S: Privacy-preserving Reversible Obfuscation of Face Images via
Secure Flow [69.78820726573935]
We name it PRO-Face S, short for Privacy-preserving Reversible Obfuscation of Face images via Secure flow-based model.
In the framework, an Invertible Neural Network (INN) is utilized to process the input image along with its pre-obfuscated form, and generate the privacy protected image that visually approximates to the pre-obfuscated one.
arXiv Detail & Related papers (2023-07-18T10:55:54Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - Benchmarking Robustness to Adversarial Image Obfuscations [22.784762155781436]
Malicious actors may obfuscate policy violating images to prevent machine learning models from reaching the correct decision.
This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors.
arXiv Detail & Related papers (2023-01-30T15:36:44Z) - Protect, Show, Attend and Tell: Empowering Image Captioning Models with
Ownership Protection [24.50702655120905]
This paper demonstrates that the current digital watermarking framework is insufficient to protect image captioning tasks.
As a remedy, this paper studies and proposes two different embedding schemes in the hidden memory state of a recurrent neural network.
To the best of our knowledge, this work is the first to propose ownership protection on image captioning task.
arXiv Detail & Related papers (2020-08-25T13:48:35Z) - InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data.
We tackle this problem in a novel image obfuscation framework.
We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.