Related papers: TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images

URL: http://arxiv.org/abs/2411.00355v1
Date: Fri, 01 Nov 2024 04:41:00 GMT
Title: TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images
Authors: Mengcheng Li, Mingbao Lin, Fei Chao, Chia-Wen Lin, Rongrong Ji,
Abstract summary: TextDestroyer is the first training- and annotation-free method for scene text destruction. Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction. The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
Score: 84.08181780666698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose TextDestroyer, the first training- and annotation-free method for scene text destruction using a pre-trained diffusion model. Existing scene text removal models require complex annotation and retraining, and may leave faint yet recognizable text information, compromising privacy protection and content concealment. TextDestroyer addresses these issues by employing a three-stage hierarchical process to obtain accurate text masks. Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction. During the diffusion denoising process, self-attention key and value are referenced from the original latent to restore the compromised background. Latent codes saved at each inversion step are used for replacement during reconstruction, ensuring perfect background restoration. The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.

Related papers

TextGuider: Training-Free Guidance for Text Rendering via Attention Alignment [68.91073792449201]
We propose TextGuider, a training-free method that encourages accurate and complete text appearance.<n>Specifically, we analyze attention patterns in Multi-Modal Diffusion Transformer(MM-DiT) models, particularly for text-related tokens intended to be rendered in the image.<n>Our method achieves state-of-the-art performance in test-time text rendering, with significant gains in recall and strong results in OCR accuracy and CLIP score.
arXiv Detail & Related papers (2025-12-10T06:18:30Z)
Text-Aware Image Restoration with Diffusion Models [30.127247716169666]
Text-Aware Image Restoration (TAIR) is a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity.<n>To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances.<n>Our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy.
arXiv Detail & Related papers (2025-06-11T17:59:46Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification. We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z)
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser [103.39279154750172]
DeepEraser is a recurrent architecture that erases the text in an image via iterative operations. DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner.
arXiv Detail & Related papers (2024-02-29T12:39:04Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions. We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z)
Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z)
Progressive Scene Text Erasing with Self-Supervision [7.118419154170154]
Scene text erasing seeks to erase text contents from scene images. Current state-of-the-art text erasing models are trained on large-scale synthetic data. We employ self-supervision for feature representation on unlabeled real-world scene text images.
arXiv Detail & Related papers (2022-07-23T09:05:13Z)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement [8.428866479825736]
Text-DIAE aims to solve two tasks, text recognition (handwritten or scene-text) and document image enhancement. We define three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Our method surpasses the state-of-the-art significantly in existing supervised and self-supervised settings.
arXiv Detail & Related papers (2022-03-09T15:44:36Z)
A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks [72.32357172679319]
This paper presents a novel ProgrEssively Region-based scene Text eraser (PERT) PERT decomposes the STR task to several erasing stages. PERT introduces a region-based modification strategy to ensure the integrity of text-free areas.
arXiv Detail & Related papers (2021-06-24T14:06:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.