Related papers: Leveraging GenAI for Segmenting and Labeling Centuries-old Technical Documents

Leveraging GenAI for Segmenting and Labeling Centuries-old Technical Documents

URL: http://arxiv.org/abs/2603.00147v1
Date: Tue, 24 Feb 2026 21:09:15 GMT
Title: Leveraging GenAI for Segmenting and Labeling Centuries-old Technical Documents
Authors: Carlos Monroy, Benjamin Navarro,
Abstract summary: We report on our work in segmenting and labeling images pertaining to shipbuilding treatises from the XVI and XVII centuries.<n>Preliminary results demonstrate the potential of marrying these technologies for improving curation and retrieval of priceless historical documents.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image segmentation and image recognition are well established computational techniques in the broader discipline of image processing. Segmentation allows to locate areas in an image, while recognition identifies specific objects within an image. These techniques have shown remarkable accuracy with modern images, mainly because the amount of training data is vast. Achieving similar accuracy in digitized images of centuries-old documents is more challenging. This difficulty is due to two main reasons: first, the lack of sufficient training data, and second, because the degree of specialization in a given domain. Despite these limitations, the ability to segment and recognize objects in these collections is important for automating the curation, cataloging, and dissemination of knowledge, making the contents of priceless collections accessible to scholars and the general public. In this paper, we report on our ongoing work in segmenting and labeling images pertaining to shipbuilding treatises from the XVI and XVII centuries, a historical period known as the Age of Exploration. To this end, we leverage SAM2 for image segmentation; Florence2 and ChatGPT for labeling; and a specialized ontology ontoShip and glossary glosShip of nautical architecture for enhancing the labeling process. Preliminary results demonstrate the potential of marrying these technologies for improving curation and retrieval of priceless historical documents. We also discuss the challenges and limitations encountered in this approach and ideas on how to overcome them in the future.

Related papers

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images [50.742420049839474]
'SaccadeDet' is an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement. Our approach, evaluated on the PANDA dataset, achieves an 8x speed increase over the state-of-the-art methods. It also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
arXiv Detail & Related papers (2024-07-25T11:22:54Z)
Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey [49.47197748663787]
This review aims to provide a first comprehensive and organized overview of the state-of-the-art research results on pseudo-label methods in the field of semi-supervised semantic segmentation.<n>In addition, we explore the application of pseudo-label technology in medical and remote-sensing image segmentation.
arXiv Detail & Related papers (2024-03-04T10:18:38Z)
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning [35.47078178526536]
Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension. This paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process.
arXiv Detail & Related papers (2023-11-02T06:21:35Z)
Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics [3.89394670917253]
We describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. We aim to create a more uniform set of labels to serve as a "bridge" in the combined dataset. Visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata.
arXiv Detail & Related papers (2022-08-20T10:59:33Z)
Multiple Instance Learning for Digital Pathology: A Review on the State-of-the-Art, Limitations & Future Potential [0.29008108937701327]
Digital whole slides images contain an enormous amount of information. Deep neural networks show high potential with respect to various tasks in the field of digital pathology. Deep learning algorithms require (manual) annotations in addition to the large amounts of image data to enable effective training. Multiple instance learning exhibits a powerful tool for learning deep neural networks in a scenario without fully annotated data.
arXiv Detail & Related papers (2022-06-09T11:27:26Z)
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels [90.88501867321573]
We consider a new problem: fine-grained image recognition without expert annotations. We learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.
arXiv Detail & Related papers (2021-11-05T17:58:37Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Hierarchical Semantic Segmentation using Psychometric Learning [17.417302703539367]
We develop a novel approach to collect segmentation annotations from experts based on psychometric testing. Our method consists of the psychometric testing procedure, active query selection, query enhancement, and a deep metric learning model. We show the merits of our method with evaluation on the synthetically generated image, aerial image and histology image.
arXiv Detail & Related papers (2021-07-07T13:38:33Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z)
A Survey on Deep Learning Methods for Semantic Image Segmentation in Real-Time [0.0]
In many areas, such as robotics and autonomous vehicles, semantic image segmentation is crucial. The success of medical diagnosis and treatment relies on the extremely accurate understanding of the data under consideration. Recent developments in deep learning have provided a host of tools to tackle this problem efficiently and with increased accuracy.
arXiv Detail & Related papers (2020-09-27T20:30:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.