Related papers: Exploring the Boundaries of Content Moderation in Text-to-Image Generation

Exploring the Boundaries of Content Moderation in Text-to-Image Generation

URL: http://arxiv.org/abs/2409.17155v1
Date: Mon, 9 Sep 2024 18:37:08 GMT
Title: Exploring the Boundaries of Content Moderation in Text-to-Image Generation
Authors: Piera Riccio, Georgina Curto, Nuria Oliver,
Abstract summary: This paper analyzes the community safety guidelines of five text-to-image (T2I) generation platforms and audits five T2I models. We argue that the concept of safety is difficult to define and operationalize, reflected in a discrepancy between the officially published safety guidelines and the actual behavior of the T2I models.
Score: 9.476463361600828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper analyzes the community safety guidelines of five text-to-image (T2I) generation platforms and audits five T2I models, focusing on prompts related to the representation of humans in areas that might lead to societal stigma. While current research primarily focuses on ensuring safety by restricting the generation of harmful content, our study offers a complementary perspective. We argue that the concept of safety is difficult to define and operationalize, reflected in a discrepancy between the officially published safety guidelines and the actual behavior of the T2I models, and leading at times to over-censorship. Our findings call for more transparency and an inclusive dialogue about the platforms' content moderation practices, bearing in mind their global cultural and social impact.

Related papers

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [333.9220561243189]
Generative Foundation Models (GenFMs) have emerged as transformative tools. Their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z)
A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models [14.325284311928492]
Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. Their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content.
arXiv Detail & Related papers (2025-02-17T20:51:20Z)
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [49.60774626839712]
Training multimodal generative models can expose users to harmful, unsafe and controversial or culturally-inappropriate outputs. We propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process to generate safer images. We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety.
arXiv Detail & Related papers (2024-11-21T09:47:13Z)
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding [13.481343482138888]
We propose a vision-agnostic safe generation framework, Embedding Sanitizer (ES) ES focuses on erasing inappropriate concepts from prompt embeddings and uses the sanitized embeddings to guide the model for safe generation. ES significantly outperforms existing safeguards in terms of interpretability and controllability while maintaining generation quality.
arXiv Detail & Related papers (2024-11-15T16:29:02Z)
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges. We propose SAFREE, a training-free approach for safe T2I and T2V. We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z)
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)
The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content [2.2618341648062477]
This paper examines the role of intent in content moderation systems. We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent.
arXiv Detail & Related papers (2024-05-17T18:05:13Z)
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation. We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts [63.61248884015162]
Text-to-image diffusion models have shown remarkable ability in high-quality content generation. This work proposes Prompting4 Debugging (P4D) as a tool that automatically finds problematic prompts for diffusion models. Our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms.
arXiv Detail & Related papers (2023-09-12T11:19:36Z)
AI's Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia [18.308417975842058]
We show how generative AI can reproduce an outsiders gaze for viewing South Asian cultures, shaped by global and regional power inequities. We distill lessons for responsible development of T2I models, recommending concrete pathways forward.
arXiv Detail & Related papers (2023-05-19T17:35:20Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice [9.356143195807064]
We study the 14 most popular social media content moderation guidelines and practices in the US. We identify the differences between the content moderation employed in mainstream social media platforms compared to fringe platforms. We highlight why platforms should shift from a one-size-fits-all model to a more inclusive model.
arXiv Detail & Related papers (2022-06-29T18:48:04Z)
Cyber Security Behaviour In Organisations [0.0]
This review explores the academic and policy literature in the context of everyday cyber security in organisations. It identifies four behavioural sets that influences how people practice cyber security. These are compliance with security policy, intergroup coordination and communication, phishing/email behaviour, and password behaviour.
arXiv Detail & Related papers (2020-04-24T14:17:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.