Prompt me a Dataset: An investigation of text-image prompting for
historical image dataset creation using foundation models
- URL: http://arxiv.org/abs/2309.01674v1
- Date: Mon, 4 Sep 2023 15:37:03 GMT
- Title: Prompt me a Dataset: An investigation of text-image prompting for
historical image dataset creation using foundation models
- Authors: Hassan El-Hajj and Matteo Valleriani
- Abstract summary: We present a pipeline for image extraction from historical documents using foundation models.
We evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity.
- Score: 0.9065034043031668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a pipeline for image extraction from historical
documents using foundation models, and evaluate text-image prompts and their
effectiveness on humanities datasets of varying levels of complexity. The
motivation for this approach stems from the high interest of historians in
visual elements printed alongside historical texts on the one hand, and from
the relative lack of well-annotated datasets within the humanities when
compared to other domains. We propose a sequential approach that relies on
GroundDINO and Meta's Segment-Anything-Model (SAM) to retrieve a significant
portion of visual data from historical documents that can then be used for
downstream development tasks and dataset creation, as well as evaluate the
effect of different linguistic prompts on the resulting detections.
Related papers
- Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - A Generic Image Retrieval Method for Date Estimation of Historical
Document Collections [0.4588028371034407]
This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections.
We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem.
arXiv Detail & Related papers (2022-04-08T12:30:39Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - A Survey of Historical Document Image Datasets [2.8707038627097226]
This paper presents a systematic literature review of image datasets for document image analysis.
It focuses on historical documents, such as handwritten manuscripts and early prints.
Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms.
arXiv Detail & Related papers (2022-03-16T09:56:48Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Including Keyword Position in Image-based Models for Act Segmentation of
Historical Registers [2.064923532131528]
We focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts.
An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon)
arXiv Detail & Related papers (2021-09-17T11:38:34Z) - docExtractor: An off-the-shelf historical document element extraction [18.828438308738495]
We present docExtractor, a generic approach for extracting visual elements such as text lines or illustrations from historical documents.
We demonstrate it provides high-quality performances as an off-the-shelf system across a wide variety of datasets.
We introduce a new public dataset dubbed IlluHisDoc dedicated to the fine evaluation of illustration segmentation in historical documents.
arXiv Detail & Related papers (2020-12-15T10:19:18Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - Combining Visual and Textual Features for Semantic Segmentation of
Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers.
Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features.
Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.