PyPotteryLens: An Open-Source Deep Learning Framework for Automated Digitisation of Archaeological Pottery Documentation
- URL: http://arxiv.org/abs/2412.11574v1
- Date: Mon, 16 Dec 2024 09:01:32 GMT
- Title: PyPotteryLens: An Open-Source Deep Learning Framework for Automated Digitisation of Archaeological Pottery Documentation
- Authors: Lorenzo Cardarelli,
- Abstract summary: PyPotteryLens is a framework that automates the digitisation and processing of archaeological pottery drawings from published sources.
The framework achieves over 97% precision and recall in pottery detection and classification tasks.
It reduces processing time by up to 5x to 20x compared to manual methods.
- Score: 0.0
- License:
- Abstract: Archaeological pottery documentation and study represents a crucial but time-consuming aspect of archaeology. While recent years have seen advances in digital documentation methods, vast amounts of legacy data remain locked in traditional publications. This paper introduces PyPotteryLens, an open-source framework that leverages deep learning to automate the digitisation and processing of archaeological pottery drawings from published sources. The system combines state-of-the-art computer vision models (YOLO for instance segmentation and EfficientNetV2 for classification) with an intuitive user interface, making advanced digital methods accessible to archaeologists regardless of technical expertise. The framework achieves over 97\% precision and recall in pottery detection and classification tasks, while reducing processing time by up to 5x to 20x compared to manual methods. Testing across diverse archaeological contexts demonstrates robust generalisation capabilities. Also, the system's modular architecture facilitates extension to other archaeological materials, while its standardised output format ensures long-term preservation and reusability of digitised data as well as solid basis for training machine learning algorithms. The software, documentation, and examples are available on GitHub (https://github.com/lrncrd/PyPottery/tree/PyPotteryLens).
Related papers
- PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings [0.0]
PyPotteryInk is an automated pipeline that transforms archaeological pottery sketches into publication-ready inked drawings.
I demonstrate the effectiveness of the approach on a dataset of Italian protohistoric pottery drawings.
The model can be fine-tuned to adapt to different archaeological contexts with minimal training data.
arXiv Detail & Related papers (2025-02-09T14:03:37Z) - Machine learning applications in archaeological practices: a review [0.0]
We reviewed 135 articles published between 1997 and 2022.
Automatic structure detection and artefact classification were the most represented tasks.
We observed, in some cases, poorly defined requirements and caveats of the machine learning methods used.
arXiv Detail & Related papers (2025-01-07T14:50:05Z) - AutArch: An AI-assisted workflow for object detection and automated
recording in archaeological catalogues [37.69303106863453]
This paper introduces a new workflow for collecting data from archaeological find catalogues available as legacy resources.
The workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data.
We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow.
arXiv Detail & Related papers (2023-11-29T17:24:04Z) - Slideflow: Deep Learning for Digital Histopathology with Real-Time
Whole-Slide Visualization [49.62449457005743]
We develop a flexible deep learning library for histopathology called Slideflow.
It supports a broad array of deep learning methods for digital pathology.
It includes a fast whole-slide interface for deploying trained models.
arXiv Detail & Related papers (2023-04-09T02:49:36Z) - ArcAid: Analysis of Archaeological Artifacts using Drawings [23.906975910478142]
Archaeology is an intriguing domain for computer vision.
It suffers not only from shortage in (labeled) data, but also from highly-challenging data, which is often extremely abraded and damaged.
This paper proposes a novel semi-supervised model for classification and retrieval of images of archaeological artifacts.
arXiv Detail & Related papers (2022-11-17T11:57:01Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Unsupervised Clustering of Roman Potsherds via Variational Autoencoders [63.8376359764052]
We propose an artificial intelligence solution to support archaeologists in the classification task of Roman commonware potsherds.
The partiality and handcrafted variance of the fragments make their matching a challenging problem.
We propose to pair similar profiles via the unsupervised hierarchical clustering of non-linear features learned in the latent space of a deep convolutional Variational Autoencoder (VAE) network.
arXiv Detail & Related papers (2022-03-14T18:56:13Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.