Constructing Image-Text Pair Dataset from Books
- URL: http://arxiv.org/abs/2310.01936v1
- Date: Tue, 3 Oct 2023 10:23:28 GMT
- Title: Constructing Image-Text Pair Dataset from Books
- Authors: Yamato Okamoto, Haruto Toyonaga, Yoshihisa Ijiri, Hirokatsu Kataoka
- Abstract summary: We propose a novel approach to leverage digital archives for machine learning.
In our experiments, we apply our pipeline on old photo books to construct an image-text pair dataset.
- Score: 10.92677060085447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Digital archiving is becoming widespread owing to its effectiveness in
protecting valuable books and providing knowledge to many people
electronically. In this paper, we propose a novel approach to leverage digital
archives for machine learning. If we can fully utilize such digitized data,
machine learning has the potential to uncover unknown insights and ultimately
acquire knowledge autonomously, just like humans read books. As a first step,
we design a dataset construction pipeline comprising an optical character
reader (OCR), an object detector, and a layout analyzer for the autonomous
extraction of image-text pairs. In our experiments, we apply our pipeline on
old photo books to construct an image-text pair dataset, showing its
effectiveness in image-text retrieval and insight extraction.
Related papers
- Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections [0.0]
This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario.
We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs)
The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
arXiv Detail & Related papers (2024-10-25T09:56:15Z) - Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection [0.3277163122167433]
We present a proof-of-concept image search application for exploring images in the National Library of Norway's pre-1900 books.
We compare Vision Transformer (ViT), Contrastive Language-Image Pre-training (CLIP), and Sigmoid loss for Language-Image Pre-training (SigLIP) embeddings for image retrieval and classification.
arXiv Detail & Related papers (2024-10-19T04:20:23Z) - See then Tell: Enhancing Key Information Extraction with Vision Grounding [54.061203106565706]
We introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding.
To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets.
arXiv Detail & Related papers (2024-09-29T06:21:05Z) - Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review [0.0]
This paper explores AI-assistive deep learning image annotation systems that provide textual suggestions, captions, or descriptions of the input image to the annotator.
We review various datasets and how they contribute to the training and evaluation of AI-assistive annotation systems.
Despite the promising potential, there is limited publicly available work on AI-assistive image annotation with textual output capabilities.
arXiv Detail & Related papers (2024-06-28T22:56:17Z) - Enhancing Textbooks with Visuals from the Web for Improved Learning [50.01434477801967]
In this paper, we investigate the effectiveness of vision-language models to automatically enhance textbooks with images from the web.
We collect a dataset of e-textbooks in the math, science, social science and business domains.
We then set up a text-image matching task that involves retrieving and appropriately assigning web images to textbooks.
arXiv Detail & Related papers (2023-04-18T12:16:39Z) - Retrieval-Augmented Transformer for Image Captioning [51.79146669195357]
We develop an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.
Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens.
Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality.
arXiv Detail & Related papers (2022-07-26T19:35:49Z) - Automatic Image Content Extraction: Operationalizing Machine Learning in
Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives.
The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.