Related papers: PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings

PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings

URL: http://arxiv.org/abs/2502.06897v1
Date: Sun, 09 Feb 2025 14:03:37 GMT
Title: PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings
Authors: Lorenzo Cardarelli,
Abstract summary: PyPotteryInk is an automated pipeline that transforms archaeological pottery sketches into publication-ready inked drawings.<n>I demonstrate the effectiveness of the approach on a dataset of Italian protohistoric pottery drawings.<n>The model can be fine-tuned to adapt to different archaeological contexts with minimal training data.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Archaeological pottery documentation traditionally requires a time-consuming manual process of converting pencil sketches into publication-ready inked drawings. I present PyPotteryInk, an open-source automated pipeline that transforms archaeological pottery sketches into standardised publication-ready drawings using a one-step diffusion model. Built on a modified img2img-turbo architecture, the system processes drawings in a single forward pass while preserving crucial morphological details and maintaining archaeologic documentation standards and analytical value. The model employs an efficient patch-based approach with dynamic overlap, enabling high-resolution output regardless of input drawing size. I demonstrate the effectiveness of the approach on a dataset of Italian protohistoric pottery drawings, where it successfully captures both fine details like decorative patterns and structural elements like vessel profiles or handling elements. Expert evaluation confirms that the generated drawings meet publication standards while significantly reducing processing time from hours to seconds per drawing. The model can be fine-tuned to adapt to different archaeological contexts with minimal training data, making it versatile across various pottery documentation styles. The pre-trained models, the Python library and comprehensive documentation are provided to facilitate adoption within the archaeological research community.

Related papers

DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models [48.347550000332866]
DRAGON is a comprehensive dataset comprising images from 25 diffusion models.<n>The dataset contains a broad variety of images representing diverse subjects.<n>DRAGON is designed to support the forensic community in developing and evaluating detection and attribution techniques for synthetic content.
arXiv Detail & Related papers (2025-05-16T13:50:34Z)
DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral [11.336757553731639]
Acquiring structured data from domain-specific, image-based documents is crucial for many downstream tasks.<n>Many documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems.<n>We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform.
arXiv Detail & Related papers (2025-05-06T06:02:42Z)
PyPotteryLens: An Open-Source Deep Learning Framework for Automated Digitisation of Archaeological Pottery Documentation [0.0]
PyPotteryLens is a framework that automates the digitisation and processing of archaeological pottery drawings from published sources.<n>The framework achieves over 97% precision and recall in pottery detection and classification tasks.<n>It reduces processing time by up to 5x to 20x compared to manual methods.
arXiv Detail & Related papers (2024-12-16T09:01:32Z)
PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents. We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z)
ArcAid: Analysis of Archaeological Artifacts using Drawings [23.906975910478142]
Archaeology is an intriguing domain for computer vision. It suffers not only from shortage in (labeled) data, but also from highly-challenging data, which is often extremely abraded and damaged. This paper proposes a novel semi-supervised model for classification and retrieval of images of archaeological artifacts.
arXiv Detail & Related papers (2022-11-17T11:57:01Z)
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement. We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents. Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z)
Learning from scarce information: using synthetic data to classify Roman fine ware pottery [0.0]
We propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process. After this first initial training the model was fine-tuned with data from photographs of real vessels.
arXiv Detail & Related papers (2021-07-03T10:30:46Z)
Key Information Extraction From Documents: Evaluation And Generator [3.878105750489656]
This research project compares state-of-the-art models for information extraction from documents. The results have shown that NLP based pre-processing is beneficial for model performance. The use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.
arXiv Detail & Related papers (2021-06-09T16:12:21Z)
Visualising Deep Network's Time-Series Representations [93.73198973454944]
Despite the popularisation of machine learning models, more often than not they still operate as black boxes with no insight into what is happening inside the model. In this paper, a method that addresses that issue is proposed, with a focus on visualising multi-dimensional time-series data. Experiments on a high-frequency stock market dataset show that the method provides fast and discernible visualisations.
arXiv Detail & Related papers (2021-03-12T09:53:34Z)
Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task. Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z)
CoSE: Compositional Stroke Embeddings [52.529172734044664]
We present a generative model for complex free-form structures such as stroke-based drawing tasks. Our approach is suitable for interactive use cases such as auto-completing diagrams.
arXiv Detail & Related papers (2020-06-17T15:22:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.