Benchmarking Robustness to Adversarial Image Obfuscations
- URL: http://arxiv.org/abs/2301.12993v2
- Date: Wed, 29 Nov 2023 18:33:43 GMT
- Title: Benchmarking Robustness to Adversarial Image Obfuscations
- Authors: Florian Stimberg, Ayan Chakrabarti, Chun-Ta Lu, Hussein Hazimeh,
Otilia Stretcu, Wei Qiao, Yintao Liu, Merve Kaya, Cyrus Rashtchian, Ariel
Fuxman, Mehmet Tek, Sven Gowal
- Abstract summary: Malicious actors may obfuscate policy violating images to prevent machine learning models from reaching the correct decision.
This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors.
- Score: 22.784762155781436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated content filtering and moderation is an important tool that allows
online platforms to build striving user communities that facilitate cooperation
and prevent abuse. Unfortunately, resourceful actors try to bypass automated
filters in a bid to post content that violate platform policies and codes of
conduct. To reach this goal, these malicious actors may obfuscate policy
violating images (e.g. overlay harmful images by carefully selected benign
images or visual patterns) to prevent machine learning models from reaching the
correct decision. In this paper, we invite researchers to tackle this specific
issue and present a new image benchmark. This benchmark, based on ImageNet,
simulates the type of obfuscations created by malicious actors. It goes beyond
ImageNet-$\textrm{C}$ and ImageNet-$\bar{\textrm{C}}$ by proposing general,
drastic, adversarial modifications that preserve the original content intent.
It aims to tackle a more common adversarial threat than the one considered by
$\ell_p$-norm bounded adversaries. We evaluate 33 pretrained models on the
benchmark and train models with different augmentations, architectures and
training methods on subsets of the obfuscations to measure generalization. We
hope this benchmark will encourage researchers to test their models and methods
and try to find new approaches that are more robust to these obfuscations.
Related papers
- Image Safeguarding: Reasoning with Conditional Vision Language Model and
Obfuscating Unsafe Content Counterfactually [3.69611312621848]
Social media platforms are increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity.
Major platforms use artificial intelligence (AI) and human moderation to obfuscate such images to make them safer.
Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided.
arXiv Detail & Related papers (2024-01-19T21:38:18Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - Privacy Safe Representation Learning via Frequency Filtering Encoder [7.792424517008007]
Adversarial Representation Learning (ARL) is a common approach to train an encoder that runs on the client-side and obfuscates an image.
It is assumed, that the obfuscated image can safely be transmitted and used for the task on the server without privacy concerns.
We introduce a novel ARL method enhanced through low-pass filtering, limiting the available information amount to be encoded in the frequency domain.
arXiv Detail & Related papers (2022-08-04T06:16:13Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Restricted Black-box Adversarial Attack Against DeepFake Face Swapping [70.82017781235535]
We introduce a practical adversarial attack that does not require any queries to the facial image forgery model.
Our method is built on a substitute model persuing for face reconstruction and then transfers adversarial examples from the substitute model directly to inaccessible black-box DeepFake models.
arXiv Detail & Related papers (2022-04-26T14:36:06Z) - ARIA: Adversarially Robust Image Attribution for Content Provenance [25.217001579437635]
We show how to generate valid adversarial images that can easily cause incorrect image attribution.
We then describe an approach to prevent imperceptible adversarial attacks on deep visual fingerprinting models.
The resulting models are substantially more robust, are accurate even on unperturbed images, and perform well even over a database with millions of images.
arXiv Detail & Related papers (2022-02-25T18:11:45Z) - Improving Robustness with Image Filtering [3.169089186688223]
This paper introduces a new image filtering scheme called Image-Graph Extractor (IGE) that extracts the fundamental nodes of an image and their connections through a graph structure.
By leveraging the IGE representation, we build a new defense method, Filtering As a Defense, that does not allow the attacker to entangle pixels to create malicious patterns.
We show that data augmentation with filtered images effectively improves the model's robustness to data corruption.
arXiv Detail & Related papers (2021-12-21T14:04:25Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z) - InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data.
We tackle this problem in a novel image obfuscation framework.
We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.