Discovering and exploring cases of educational source code plagiarism
with Dolos
- URL: http://arxiv.org/abs/2402.10853v2
- Date: Wed, 21 Feb 2024 10:51:12 GMT
- Title: Discovering and exploring cases of educational source code plagiarism
with Dolos
- Authors: Rien Maertens, Maarten Van Neyghem, Maxiem Geldhof, Charlotte Van
Petegem, Niko Strijbol, Peter Dawyndt, Bart Mesuere
- Abstract summary: Dolos is an ecosystem of tools for detecting and preventing plagiarism in educational source code.
Educators can now run the entire plagiarism pipeline from a new web app in their browser.
New dashboards provide an instant assessment of whether a collection of source files contains suspected cases of plagiarism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Source code plagiarism is a significant issue in educational practice, and
educators need user-friendly tools to cope with such academic dishonesty. This
article introduces the latest version of Dolos, a state-of-the-art ecosystem of
tools for detecting and preventing plagiarism in educational source code. In
this new version, the primary focus has been on enhancing the user experience.
Educators can now run the entire plagiarism detection pipeline from a new web
app in their browser, eliminating the need for any installation or
configuration. Completely redesigned analytics dashboards provide an instant
assessment of whether a collection of source files contains suspected cases of
plagiarism and how widespread plagiarism is within the collection. The
dashboards support hierarchically structured navigation to facilitate zooming
in and out of suspect cases. Clusters are an essential new component of the
dashboard design, reflecting the observation that plagiarism can occur among
larger groups of students. To meet various user needs, the Dolos software stack
for source code plagiarism detections now includes a web interface, a JSON
application programming interface (API), a command line interface (CLI), a
JavaScript library and a preconfigured Docker container. Clear documentation
and a free-to-use instance of the web app can be found at
https://dolos.ugent.be. The source code is also available on GitHub.
Related papers
- LLMs Plagiarize: Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison [0.0]
We propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model.
Unlike current methods, we utilize an approach that uses Resource Description Framework (RDF) triples to create knowledge graphs from both a source document and an LLM continuation of that document.
These graphs are then analyzed with respect to content using cosine similarity and with respect to structure using a normalized version of graph edit distance that shows the degree of isomorphism.
arXiv Detail & Related papers (2024-07-02T20:49:21Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - A Survey of Plagiarism Detection Systems: Case of Use with English,
French and Arabic Languages [0.0]
This paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings.
An indepth examination of technical forms of plagiarism was also performed in the context of this study.
arXiv Detail & Related papers (2022-01-10T16:11:54Z) - Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts [0.0]
Hamtajoo is a Persian plagiarism detection system for academic manuscripts.
We describe the overall structure of the system along with the algorithms used in each stage.
In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.
arXiv Detail & Related papers (2021-12-27T15:45:35Z) - The Struggle with Academic Plagiarism: Approaches based on Semantic
Similarity [0.0]
We present a report of how semantic similarity measures can be used in the plagiarism detection task.
Current software has proven to be successful, however the problem of identifying paraphrasing or obfuscation plagiarism remains unresolved.
arXiv Detail & Related papers (2021-06-02T20:00:33Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z) - Mossad: Defeating Software Plagiarism Detection [0.48225981108928456]
This paper presents an entirely automatic program transformation approach, Mossad, that defeats popular software plagiarism detection tools.
It comprises a framework that couples techniques inspired by genetic programming with domain-specific knowledge to effectively undermine plagiarism detectors.
Moss is both fast and effective: it can, in minutes, generate modified versions of programs that are likely to escape detection.
arXiv Detail & Related papers (2020-10-04T22:02:38Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.