Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers
- URL: http://arxiv.org/abs/2512.21709v1
- Date: Thu, 25 Dec 2025 15:04:29 GMT
- Title: Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers
- Authors: Md. Rakibul Islam, Most. Sharmin Sultana Samu, Md. Zahid Hossain, Farhad Uz Zaman, Md. Kamrozzaman Bhuiyan,
- Abstract summary: Large language models (LLMs) can produce text that closely resembles human writing.<n>This capability raises concerns about misuse, including disinformation and content manipulation.<n>Existing research has addressed detection in multiple languages, but the Bengali language remains largely unexplored.<n>This study investigates five transformer-based models: XLMRoBERTa-Large, mDeBERTaV3-Base, BanglaBERT-Base, IndicBERT-Base and MultilingualBERT-Base.
- Score: 0.2446672595462589
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) can produce text that closely resembles human writing. This capability raises concerns about misuse, including disinformation and content manipulation. Detecting AI-generated text is essential to maintain authenticity and prevent malicious applications. Existing research has addressed detection in multiple languages, but the Bengali language remains largely unexplored. Bengali's rich vocabulary and complex structure make distinguishing human-written and AI-generated text particularly challenging. This study investigates five transformer-based models: XLMRoBERTa-Large, mDeBERTaV3-Base, BanglaBERT-Base, IndicBERT-Base and MultilingualBERT-Base. Zero-shot evaluation shows that all models perform near chance levels (around 50% accuracy) and highlight the need for task-specific fine-tuning. Fine-tuning significantly improves performance, with XLM-RoBERTa, mDeBERTa and MultilingualBERT achieving around 91% on both accuracy and F1-score. IndicBERT demonstrates comparatively weaker performance, indicating limited effectiveness in fine-tuning for this task. This work advances AI-generated text detection in Bengali and establishes a foundation for building robust systems to counter AI-generated content.
Related papers
- AI-Generated Text Detection in Low-Resource Languages: A Case Study on Urdu [0.0]
Large Language Models (LLMs) are now capable of generating text that closely resembles human writing.<n>This makes it harder to tell whether a piece of text was written by a human or by a machine.<n>We propose a novel AI-generated text detection framework tailored for the Urdu language.
arXiv Detail & Related papers (2025-10-18T16:45:25Z) - AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models [0.2796197251957245]
Large Language Models (LLMs) produce text that is both grammatically correct and semantically meaningful.<n>LLMs have been misused to create highly realistic phishing emails, spread fake news, generate code to automate cyber crime, and write fraudulent scientific articles.<n>Various attempts have been made to distinguish machine-generated text from human-authored content using linguistic, statistical, machine learning, and ensemble-based approaches.
arXiv Detail & Related papers (2025-07-07T16:13:13Z) - AI-generated Text Detection: A Multifaceted Approach to Binary and Multiclass Classification [0.13392361199400257]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.<n>Such capabilities are prone to potential misuse, such as fake news generation, spam email creation, and misuse in academic assignments.<n>We propose two neural architectures: an optimized model and a simpler variant.<n>For Task A, the optimized neural architecture achieved fifth place with $F1$ score of 0.994, and for Task B, the simpler neural architecture also ranked fifth place with $F1$ score of 0.627.
arXiv Detail & Related papers (2025-05-15T09:28:06Z) - LuxVeri at GenAI Detection Task 1: Inverse Perplexity Weighted Ensemble for Robust Detection of AI-Generated Text across English and Multilingual Contexts [0.8495482945981923]
This paper presents a system developed for Task 1 of the COLING 2025 Workshop on Detecting AI-Generated Content.<n>Our approach utilizes an ensemble of models, with weights assigned according to each model's inverse perplexity, to enhance classification accuracy.<n>Our results demonstrate the effectiveness of inverse perplexity weighting in improving the robustness of machine-generated text detection across both monolingual and multilingual settings.
arXiv Detail & Related papers (2025-01-21T06:32:32Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Neural Models for Offensive Language Detection [0.0]
Offensive language detection is an ever-growing natural language processing (NLP) application.
We believe contributing to improving and comparing different machine learning models to fight such harmful contents is an important and challenging goal for this thesis.
arXiv Detail & Related papers (2021-05-30T13:02:45Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.