Toward Real Text Manipulation Detection: New Dataset and New Solution
- URL: http://arxiv.org/abs/2312.06934v2
- Date: Tue, 23 Jan 2024 09:23:40 GMT
- Title: Toward Real Text Manipulation Detection: New Dataset and New Solution
- Authors: Dongliang Luo, Yuliang Liu, Rui Yang, Xianjin Liu, Jishen Zeng, Yu
Zhou, Xiang Bai
- Abstract summary: High costs associated with professional text manipulation limit the availability of real-world datasets.
We present the Real Text Manipulation dataset, encompassing 14,250 text images.
Our contributions aim to propel advancements in real-world text tampering detection.
- Score: 58.557504531896704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the surge in realistic text tampering, detecting fraudulent text in
images has gained prominence for maintaining information security. However, the
high costs associated with professional text manipulation and annotation limit
the availability of real-world datasets, with most relying on synthetic
tampering, which inadequately replicates real-world tampering attributes. To
address this issue, we present the Real Text Manipulation (RTM) dataset,
encompassing 14,250 text images, which include 5,986 manually and 5,258
automatically tampered images, created using a variety of techniques, alongside
3,006 unaltered text images for evaluating solution stability. Our evaluations
indicate that existing methods falter in text forgery detection on the RTM
dataset. We propose a robust baseline solution featuring a Consistency-aware
Aggregation Hub and a Gated Cross Neighborhood-attention Fusion module for
efficient multi-modal information fusion, supplemented by a Tampered-Authentic
Contrastive Learning module during training, enriching feature representation
distinction. This framework, extendable to other dual-stream architectures,
demonstrated notable localization performance improvements of 7.33% and 6.38%
on manual and overall manipulations, respectively. Our contributions aim to
propel advancements in real-world text tampering detection. Code and dataset
will be made available at https://github.com/DrLuo/RTM
Related papers
- TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition [7.560970003549404]
TextSSR is a novel framework for Synthesizing Scene Text Recognition data.
It ensures accuracy by focusing on generating text within a specified image region.
We construct an anagram-based TextSSR-F dataset with 0.4 million text instances with complexity and realism.
arXiv Detail & Related papers (2024-12-02T05:26:25Z) - Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition [56.968108142307976]
We propose a novel approach called Class-Aware Mask-guided feature refinement (CAM)
Our approach introduces canonical class-aware glyph masks to suppress background and text style noise.
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion.
arXiv Detail & Related papers (2024-02-21T09:22:45Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Towards Improving Document Understanding: An Exploration on
Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images.
We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model.
Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z) - TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image
Super-Resolution [18.73348268987249]
TextDiff is a diffusion-based framework tailored for scene text image super-resolution.
It achieves state-of-the-art (SOTA) performance on public benchmark datasets.
Our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods.
arXiv Detail & Related papers (2023-08-13T11:02:16Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Stroke-Based Scene Text Erasing Using Synthetic Data [0.0]
Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
arXiv Detail & Related papers (2021-04-23T09:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.