Related papers: Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records

Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records

URL: http://arxiv.org/abs/2501.09884v1
Date: Thu, 16 Jan 2025 23:54:54 GMT
Title: Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records
Authors: Fausto German, Brian Keith, Mauricio Matus, Diego Urrutia, Claudio Meneses,
Abstract summary: We present a semi-supervised approach to extracting narratives from historical photographic records using an adaptation of the narrative maps algorithm.<n>Our method is applied to the ROGER dataset, a collection of photographs from the 1928 Sacambaya Expedition in Bolivia captured by Robert Gerstmann.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a semi-supervised approach to extracting narratives from historical photographic records using an adaptation of the narrative maps algorithm. We extend the original unsupervised text-based method to work with image data, leveraging deep learning techniques for visual feature extraction and similarity computation. Our method is applied to the ROGER dataset, a collection of photographs from the 1928 Sacambaya Expedition in Bolivia captured by Robert Gerstmann. We compare our algorithmically extracted visual narratives with expert-curated timelines of varying lengths (5 to 30 images) to evaluate the effectiveness of our approach. In particular, we use the Dynamic Time Warping (DTW) algorithm to match the extracted narratives with the expert-curated baseline. In addition, we asked an expert on the topic to qualitatively evaluate a representative example of the resulting narratives. Our findings show that the narrative maps approach generally outperforms random sampling for longer timelines (10+ images, p < 0.05), with expert evaluation confirming the historical accuracy and coherence of the extracted narratives. This research contributes to the field of computational analysis of visual cultural heritage, offering new tools for historians, archivists, and digital humanities scholars to explore and understand large-scale image collections. The method's ability to generate meaningful narratives from visual data opens up new possibilities for the study and interpretation of historical events through photographic evidence.

Related papers

Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection [54.26588902144298]
We propose a knowledge-guided prompt learning method for deepfake facial image detection.<n>Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts.<n>Our proposed approach notably outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-01T02:18:18Z)
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z)
PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents. We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z)
Blind Dates: Examining the Expression of Temporality in Historical Photographs [57.07335632641355]
We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model. We use the textitDe Boer Scene Detection dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999. Our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers.
arXiv Detail & Related papers (2023-10-10T13:51:24Z)
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models [0.9065034043031668]
We present a pipeline for image extraction from historical documents using foundation models. We evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity.
arXiv Detail & Related papers (2023-09-04T15:37:03Z)
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z)
A Decade Survey of Content Based Image Retrieval using Deep Learning [13.778851745408133]
This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. Deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade.
arXiv Detail & Related papers (2020-11-23T02:12:30Z)
Narrative Maps: An Algorithmic Approach to Represent and Extract Information Narratives [6.85316573653194]
This article combines the theory of narrative representations with the data from modern online systems. A narrative map representation illustrates the events and stories in the narrative as a series of landmarks and routes on the map. Our findings have implications for intelligence analysts, computational journalists, and misinformation researchers.
arXiv Detail & Related papers (2020-09-09T18:30:44Z)
From A Glance to "Gotcha": Interactive Facial Image Retrieval with Progressive Relevance Feedback [72.29919762941029]
We propose an end-to-end framework to retrieve facial images with relevance feedback progressively provided by the witness. With no need of any extra annotations, our model can be applied at the cost of a little response effort.
arXiv Detail & Related papers (2020-07-30T18:46:25Z)
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers. Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features. Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.