Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
- URL: http://arxiv.org/abs/2106.05764v1
- Date: Thu, 10 Jun 2021 14:11:52 GMT
- Title: Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
- Authors: Norman Meuschke
- Abstract summary: The thesis proposes plagiarism detection approaches that implement a different concept: analyzing non-textual content in academic documents.
To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates the analysis of citation-based, image-based, math-based, and text-based document similarity.
- Score: 0.8490310884703459
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Identifying academic plagiarism is a pressing problem, among others, for
research institutions, publishers, and funding organizations. Detection
approaches proposed so far analyze lexical, syntactical, and semantic text
similarity. These approaches find copied, moderately reworded, and literally
translated text. However, reliably detecting disguised plagiarism, such as
strong paraphrases, sense-for-sense translations, and the reuse of non-textual
content and ideas, is an open research problem.
The thesis addresses this problem by proposing plagiarism detection
approaches that implement a different concept: analyzing non-textual content in
academic documents, specifically citations, images, and mathematical content.
To validate the effectiveness of the proposed detection approaches, the
thesis presents five evaluations that use real cases of academic plagiarism and
exploratory searches for unknown cases.
The evaluation results show that non-textual content elements contain a high
degree of semantic information, are language-independent, and largely immutable
to the alterations that authors typically perform to conceal plagiarism.
Analyzing non-textual content complements text-based detection approaches and
increases the detection effectiveness, particularly for disguised forms of
academic plagiarism.
To demonstrate the benefit of combining non-textual and text-based detection
methods, the thesis describes the first plagiarism detection system that
integrates the analysis of citation-based, image-based, math-based, and
text-based document similarity. The system's user interface employs
visualizations that significantly reduce the effort and time users must invest
in examining content similarity.
Related papers
- BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System [0.0]
We propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets.
We also propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy.
Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score.
arXiv Detail & Related papers (2024-04-01T12:20:34Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Text Similarity from Image Contents using Statistical and Semantic
Analysis Techniques [0.0]
Image Content Plagiarism Detection (ICPD) has gained importance, utilizing advanced image content processing to identify instances of plagiarism.
In this paper, the system has been implemented to detect plagiarism form contents of Images such as Figures, Graphs, Tables etc.
Along with statistical algorithms such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT, WordNet outperformed in detecting efficient and accurate plagiarism.
arXiv Detail & Related papers (2023-08-24T15:06:04Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - A Survey of Plagiarism Detection Systems: Case of Use with English,
French and Arabic Languages [0.0]
This paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings.
An indepth examination of technical forms of plagiarism was also performed in the context of this study.
arXiv Detail & Related papers (2022-01-10T16:11:54Z) - Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts [0.0]
Hamtajoo is a Persian plagiarism detection system for academic manuscripts.
We describe the overall structure of the system along with the algorithms used in each stage.
In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.
arXiv Detail & Related papers (2021-12-27T15:45:35Z) - Taxonomy of academic plagiarism methods [0.0]
The article defines plagiarism, explains the origin of the term, as well as plagiarism related terms.
It identifies the extent of the plagiarism domain and then focuses on the plagiarism subdomain of text documents, for which it gives an overview of current classifications.
The article suggests the new classification of academic plagiarism, describes sorts and methods of plagiarism, types and categories, approaches and phases of plagiarism detection, the classification of methods and algorithms for plagiarism detection.
arXiv Detail & Related papers (2021-05-25T16:49:08Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.