Related papers: Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

URL: http://arxiv.org/abs/2112.13742v1
Date: Mon, 27 Dec 2021 15:45:35 GMT
Title: Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts
Authors: Vahid Zarrabi, Salar Mohtaj, Habibollah Asghari
Abstract summary: Hamtajoo is a Persian plagiarism detection system for academic manuscripts. We describe the overall structure of the system along with the algorithms used in each stage. In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, due to the high availability of electronic documents through the Web, the plagiarism has become a serious challenge, especially among scholars. Various plagiarism detection systems have been developed to prevent text re-use and to confront plagiarism. Although it is almost easy to detect duplicate text in academic manuscripts, finding patterns of text re-use that has been semantically changed is of great importance. Another important issue is to deal with less resourced languages, which there are low volume of text for training purposes and also low performance in tools for NLP applications. In this paper, we introduce Hamtajoo, a Persian plagiarism detection system for academic manuscripts. Moreover, we describe the overall structure of the system along with the algorithms used in each stage. In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.

Related papers

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System [0.0]
We propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets. We also propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy. Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score.
arXiv Detail & Related papers (2024-04-01T12:20:34Z)
Text Similarity from Image Contents using Statistical and Semantic Analysis Techniques [0.0]
Image Content Plagiarism Detection (ICPD) has gained importance, utilizing advanced image content processing to identify instances of plagiarism. In this paper, the system has been implemented to detect plagiarism form contents of Images such as Figures, Graphs, Tables etc. Along with statistical algorithms such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT, WordNet outperformed in detecting efficient and accurate plagiarism.
arXiv Detail & Related papers (2023-08-24T15:06:04Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
A Survey of Plagiarism Detection Systems: Case of Use with English, French and Arabic Languages [0.0]
This paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings. An indepth examination of technical forms of plagiarism was also performed in the context of this study.
arXiv Detail & Related papers (2022-01-10T16:11:54Z)
Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues. A main challenge is that a person often writes a letter in different styles from time to time. We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z)
Towards generating citation sentences for multiple references with intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs. Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z)
Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals [69.76097138157816]
Probabilistic text generators have been used to produce fake scientific papers for more than a decade. Complex AI-powered generation techniques produce texts indistinguishable from that of humans. Some websites offer to rewrite texts for free, generating gobbledegook full of tortured phrases.
arXiv Detail & Related papers (2021-07-12T20:47:08Z)
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism [0.8490310884703459]
The thesis proposes plagiarism detection approaches that implement a different concept: analyzing non-textual content in academic documents. To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates the analysis of citation-based, image-based, math-based, and text-based document similarity.
arXiv Detail & Related papers (2021-06-10T14:11:52Z)
The Struggle with Academic Plagiarism: Approaches based on Semantic Similarity [0.0]
We present a report of how semantic similarity measures can be used in the plagiarism detection task. Current software has proven to be successful, however the problem of identifying paraphrasing or obfuscation plagiarism remains unresolved.
arXiv Detail & Related papers (2021-06-02T20:00:33Z)
SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods. We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system. This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.