Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval
- URL: http://arxiv.org/abs/2506.23132v1
- Date: Sun, 29 Jun 2025 07:58:53 GMT
- Title: Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval
- Authors: Sophie Zhou, Shu Kong,
- Abstract summary: We construct a dataset by collecting painting photos and synthesizing plagiarized versions using generative AI.<n>We first establish a baseline approach using off-the-shelf features from the visual foundation model DINOv2 to retrieve the most similar images in the database.<n>We finetune DINOv2 with a metric learning loss using positive and negative sample pairs sampled in the database.
- Score: 8.670873561640903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting photos and synthesizing plagiarized versions using generative AI, tailored to specific artists' styles. We first establish a baseline approach using off-the-shelf features from the visual foundation model DINOv2 to retrieve the most similar images in the database and classify plagiarism based on a similarity threshold. Surprisingly, this non-learned method achieves a high recognition accuracy of 97.2\% but suffers from low retrieval precision 29.0\% average precision (AP). To improve retrieval quality, we finetune DINOv2 with a metric learning loss using positive and negative sample pairs sampled in the database. The finetuned model greatly improves retrieval performance by 12\% AP over the baseline, though it unexpectedly results in a lower recognition accuracy (92.7\%). We conclude with insightful discussions and outline directions for future research.
Related papers
- TokBench: Evaluating Your Visual Tokenizer before Visual Generation [75.38270351179018]
We analyze text and face reconstruction quality across various scales for different image tokenizers and VAEs.<n>Our results show modern visual tokenizers still struggle to preserve fine-grained features, especially at smaller scales.
arXiv Detail & Related papers (2025-05-23T17:52:16Z) - CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation [132.00910067533982]
We introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations.
We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters.
arXiv Detail & Related papers (2024-07-09T17:58:18Z) - BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System [0.0]
We propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets.
We also propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy.
Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score.
arXiv Detail & Related papers (2024-04-01T12:20:34Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Text Similarity from Image Contents using Statistical and Semantic
Analysis Techniques [0.0]
Image Content Plagiarism Detection (ICPD) has gained importance, utilizing advanced image content processing to identify instances of plagiarism.
In this paper, the system has been implemented to detect plagiarism form contents of Images such as Figures, Graphs, Tables etc.
Along with statistical algorithms such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT, WordNet outperformed in detecting efficient and accurate plagiarism.
arXiv Detail & Related papers (2023-08-24T15:06:04Z) - Whodunit? Learning to Contrast for Authorship Attribution [22.37948005237967]
Authorship attribution is the task of identifying the author of a given text.
We propose to fine-tune pre-trained language representations using a combination of contrastive learning and supervised learning.
We show that Contra-X advances the state-of-the-art on multiple human and machine authorship attribution benchmarks.
arXiv Detail & Related papers (2022-09-23T23:45:08Z) - Where Does the Performance Improvement Come From? - A Reproducibility
Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval.
We first examine the related concerns and why the focus is on image-text retrieval tasks.
We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z) - A Replication Study of Dense Passage Retriever [32.192420072129636]
We study the dense passage retriever (DPR) technique proposed by Karpukhin et al. ( 2020) for end-to-end open-domain question answering.
We present a replication study of this work, starting with model checkpoints provided by the authors.
We are able to improve end-to-end question answering effectiveness using exactly the same models as in the original work.
arXiv Detail & Related papers (2021-04-12T18:10:39Z) - Learning to Recognize Patch-Wise Consistency for Deepfake Detection [39.186451993950044]
We propose a representation learning approach for this task, called patch-wise consistency learning (PCL)
PCL learns by measuring the consistency of image source features, resulting to representation with good interpretability and robustness to multiple forgery methods.
We evaluate our approach on seven popular Deepfake detection datasets.
arXiv Detail & Related papers (2020-12-16T23:06:56Z) - Visually Grounded Compound PCFGs [65.04669567781634]
Exploiting visual groundings for language understanding has recently been drawing much attention.
We study visually grounded grammar induction and learn a constituency from both unlabeled text and its visual captions.
arXiv Detail & Related papers (2020-09-25T19:07:00Z) - Unsupervised Landmark Learning from Unpaired Data [117.81440795184587]
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses.
We propose a cross-image cycle consistency framework which applies the swapping-reconstruction strategy twice to obtain the final supervision.
Our proposed framework is shown to outperform strong baselines by a large margin.
arXiv Detail & Related papers (2020-06-29T13:57:20Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.