Contrastive Attention Networks for Attribution of Early Modern Print
- URL: http://arxiv.org/abs/2306.07998v1
- Date: Mon, 12 Jun 2023 19:57:11 GMT
- Title: Contrastive Attention Networks for Attribution of Early Modern Print
- Authors: Nikolai Vogler, Kartik Goyal, Kishore PV Reddy, Elizaveta Pertseva,
Samuel V. Lemley, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick
- Abstract summary: We develop machine learning techniques to identify unknown printers in early modern (c.1500--1800) English printed books.
Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers.
- Score: 23.344655278038392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we develop machine learning techniques to identify unknown
printers in early modern (c.~1500--1800) English printed books. Specifically,
we focus on matching uniquely damaged character type-imprints in anonymously
printed books to works with known printers in order to provide evidence of
their origins. Until now, this work has been limited to manual investigations
by analytical bibliographers. We present a Contrastive Attention-based Metric
Learning approach to identify similar damage across character image pairs,
which is sensitive to very subtle differences in glyph shapes, yet robust to
various confounding sources of noise associated with digitized historical
books. To overcome the scarce amount of supervised data, we design a random
data synthesis procedure that aims to simulate bends, fractures, and inking
variations induced by the early printing process. Our method successfully
improves downstream damaged type-imprint matching among printed works from this
period, as validated by in-domain human experts. The results of our approach on
two important philosophical works from the Early Modern period demonstrate
potential to extend the extant historical research about the origins and
content of these books.
Related papers
- Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention [62.671435607043875]
Research indicates that text-to-image diffusion models replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks.
We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens.
We introduce an innovative approach to detect and mitigate memorization in diffusion models.
arXiv Detail & Related papers (2024-03-17T01:27:00Z) - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt.
Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - Sampling and Ranking for Digital Ink Generation on a tight computational
budget [69.15275423815461]
We study ways to maximize the quality of the output of a trained digital ink generative model.
We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.
arXiv Detail & Related papers (2023-06-02T09:55:15Z) - AGTGAN: Unpaired Image Translation for Photographic Ancient Character
Generation [27.77329906930072]
We propose an unsupervised generative adversarial network called AGTGAN.
By explicit global and local glyph shape style modeling, our method can generate characters with diverse glyphs and realistic textures.
With our generated images, experiments on the largest photographic oracle bone character dataset show that our method can achieve a significant increase in classification accuracy, up to 16.34%.
arXiv Detail & Related papers (2023-03-13T11:18:41Z) - Writer Retrieval and Writer Identification in Greek Papyri [4.44566870214758]
Writer identification refers to the classification of known writers while writer retrieval seeks to find the writer by means of image similarity in a dataset of images.
While automatic writer identification/retrieval methods already provide promising results for many historical document types, papyri data is very challenging due to the fiber structures and severe artifacts.
We investigate several methods and show that a good binarization is key to an improved writer identification in papyri writings.
arXiv Detail & Related papers (2022-12-15T08:42:25Z) - Paraphrase Identification with Deep Learning: A Review of Datasets and Methods [1.4325734372991794]
We investigate how the under-representation of certain paraphrase types in popular datasets affects the ability to detect plagiarism.
We introduce and validate a new refined typology for paraphrases.
We propose new directions for future research and dataset development to enhance AI-based paraphrase detection.
arXiv Detail & Related papers (2022-12-13T23:06:20Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Artificial Fingerprinting for Generative Models: Rooting Deepfake
Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs)
Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation.
We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z) - Print Defect Mapping with Semantic Segmentation [4.189639503810488]
We propose the first end-to-end framework to map print defects at pixel level.
Our framework uses Convolutional Neural Networks, specifically DeepLab-v3+, and achieves promising results.
arXiv Detail & Related papers (2020-01-27T22:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.