TextSleuth: Towards Explainable Tampered Text Detection
- URL: http://arxiv.org/abs/2412.14816v3
- Date: Wed, 15 Jan 2025 16:54:36 GMT
- Title: TextSleuth: Towards Explainable Tampered Text Detection
- Authors: Chenfan Qu, Jian Liu, Haoxing Chen, Baihan Yu, Jingjing Liu, Weiqiang Wang, Lianwen Jin,
- Abstract summary: We propose to explain the basis of tampered text detection with natural language via large multimodal models.
To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD.
Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o.
To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
- Score: 49.88698441048043
- License:
- Abstract: Recently, tampered text detection has attracted increasing attention due to its essential role in information security. Although existing methods can detect the tampered text region, the interpretation of such detection remains unclear, making the prediction unreliable. To address this problem, we propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD, which contains both pixel-level annotations for tampered text region and natural language annotations describing the anomaly of the tampered text. Multiple methods are employed to improve the quality of the proposed data. For example, elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. A fused mask prompt is proposed to reduce confusion when querying GPT4o to generate anomaly descriptions. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts before describing the anomaly, and to filter out the responses with low OCR accuracy. To further improve explainable tampered text detection, we propose a simple yet effective model called TextSleuth, which achieves improved fine-grained perception and cross-domain generalization by focusing on the suspected region, with a two-stage analysis paradigm and an auxiliary grounding prompt. Extensive experiments on both the ETTD dataset and the public dataset have verified the effectiveness of the proposed methods. In-depth analysis is also provided to inspire further research. Our dataset and code will be open-source.
Related papers
- Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM [81.75988648572347]
We present DetToolChain, a novel prompting paradigm to unleash the zero-shot object detection ability of multimodal large language models (MLLMs)
Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts.
We show that GPT-4V with our DetToolChain improves state-of-the-art object detectors by +21.5% AP50 on MS Novel class set for open-vocabulary detection.
arXiv Detail & Related papers (2024-03-19T06:54:33Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting [11.705454066278898]
We propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework.
The proposed method reduces the tight dependency between detection and recognition modules.
It achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks.
arXiv Detail & Related papers (2022-03-10T02:41:05Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Bidirectional Regression for Arbitrary-Shaped Text Detection [16.30976392505236]
This paper presents a novel text instance expression which integrates both foreground and background information into the pipeline.
A corresponding post-processing algorithm is also designed to sequentially combine the four prediction results and reconstruct the text instance accurately.
We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets.
arXiv Detail & Related papers (2021-07-13T14:29:09Z) - Text Recognition -- Real World Data and Where to Find Them [36.10220484561196]
We present a method for exploiting weakly annotated images to improve text extraction pipelines.
The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions.
It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT)
arXiv Detail & Related papers (2020-07-06T22:23:27Z) - DGST : Discriminator Guided Scene Text detector [11.817428636084305]
This paper proposes a detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection.
Experiments on standard datasets demonstrate that the proposed D GST brings noticeable gain and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-02-28T01:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.