EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT
- URL: http://arxiv.org/abs/2304.02084v4
- Date: Mon, 20 May 2024 15:20:03 GMT
- Title: EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT
- Authors: Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, W. Brent Seales,
- Abstract summary: We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images.
We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.
Related papers
- RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace [0.7937206070844555]
Intra-operative 2D-3D registration of X-ray images with pre-operatively acquired CT scans is a crucial procedure in orthopedic surgeries.
We propose a novel method to address this issue by detecting arbitrary landmark points in X-ray images.
arXiv Detail & Related papers (2024-10-10T17:36:21Z) - Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification [8.192975020366777]
DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients.
Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes.
We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model.
arXiv Detail & Related papers (2024-06-06T02:19:18Z) - CLIM: Contrastive Language-Image Mosaic for Region Representation [58.05870131126816]
Contrastive Language-Image Mosaic (CLIM) is a novel approach for aligning region and text representations.
CLIM consistently improves different open-vocabulary object detection methods.
It can effectively enhance the region representation of vision-language models.
arXiv Detail & Related papers (2023-12-18T17:39:47Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized
Herculaneum Papyri [23.090618261864886]
We propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the Herculaneum papyri.
To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation.
arXiv Detail & Related papers (2023-08-09T17:00:43Z) - A Simple Framework for Open-Vocabulary Segmentation and Detection [85.21641508535679]
We present OpenSeeD, a simple Open-vocabulary and Detection framework that jointly learns from different segmentation and detection datasets.
We first introduce a pre-trained text encoder to encode all the visual concepts in two tasks and learn a common semantic space for them.
After pre-training, our model exhibits competitive or stronger zero-shot transferability for both segmentation and detection.
arXiv Detail & Related papers (2023-03-14T17:58:34Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - Improving Joint Learning of Chest X-Ray and Radiology Report by Word
Region Alignment [9.265044250068554]
This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports.
The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching.
arXiv Detail & Related papers (2021-09-04T22:58:35Z) - Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging [70.52819168140113]
We use a dataset of over 20,000 subjects from the UK Biobank with both whole body Dixon technique magnetic resonance (MR) scans and also dual-energy x-ray absorptiometry (DXA) scans.
We introduce a multi-modal image-matching contrastive framework, that is able to learn to match different-modality scans of the same subject with high accuracy.
Without any adaption, we show that the correspondences learnt during this contrastive training step can be used to perform automatic cross-modal scan registration.
arXiv Detail & Related papers (2021-07-14T12:35:05Z) - Bone Structures Extraction and Enhancement in Chest Radiographs via CNN
Trained on Synthetic Data [2.969705152497174]
We present a deep learning-based image processing technique for extraction of bone structures in chest radiographs using a U-Net FCNN.
The U-Net was trained to accomplish the task in a fully supervised setting.
We show that our enhancement technique is applicable to real x-ray data, and display our results on the NIH Chest X-Ray-14 dataset.
arXiv Detail & Related papers (2020-03-20T20:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.