Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections
- URL: http://arxiv.org/abs/2507.00961v1
- Date: Tue, 01 Jul 2025 17:10:34 GMT
- Title: Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections
- Authors: Ying-Hsiang Huang, Benjamin Charles Germain Lee,
- Abstract summary: Digital Collections Explorer is a web-based, open-source exploratory search platform.<n>Our interface enables natural language queries and reverse image searches over digital collections with visual features.<n>This paper describes the system's architecture, implementation, and application to various cultural heritage collections.
- Score: 0.09208007322096533
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital collections. Our Digital Collections Explorer can be installed locally and configured to run on a visual collection of interest on disk in just a few steps. Building upon recent advances in multimodal search techniques, our interface enables natural language queries and reverse image searches over digital collections with visual features. This paper describes the system's architecture, implementation, and application to various cultural heritage collections, demonstrating its potential for democratizing access to digital archives, especially those with impoverished metadata. We present case studies with maps, photographs, and PDFs extracted from web archives in order to demonstrate the flexibility of the Digital Collections Explorer, as well as its ease of use. We demonstrate that the Digital Collections Explorer scales to hundreds of thousands of images on a MacBook Pro with an M4 chip. Lastly, we host a public demo of Digital Collections Explorer.
Related papers
- Knowledge Graphs for Digitized Manuscripts in Jagiellonian Digital Library Application [8.732274235941974]
Galleries, libraries, archives and museums (GLAM institutions) are actively digitizing their holdings and creates extensive digital collections.<n>These collections are often enriched with metadata describing items but not exactly their contents.<n>We explore an integrated methodology of computer vision (CV), artificial intelligence (AI) and semantic web technologies to enrich metadata and construct knowledge graphs for digitized manuscripts and incunabula.
arXiv Detail & Related papers (2025-05-29T14:49:24Z) - Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models [0.0]
We introduce a method for using state-of-the-art multimodal large language models (LLMs) to enable an open-ended, explainable search and discovery interface for visual collections.
We show how our approach can create novel clustering and recommendation systems that avoid common pitfalls of methods based directly on visual embeddings.
arXiv Detail & Related papers (2024-11-07T12:48:39Z) - Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection [0.3277163122167433]
We present a proof-of-concept image search application for exploring images in the National Library of Norway's pre-1900 books.
We compare Vision Transformer (ViT), Contrastive Language-Image Pre-training (CLIP), and Sigmoid loss for Language-Image Pre-training (SigLIP) embeddings for image retrieval and classification.
arXiv Detail & Related papers (2024-10-19T04:20:23Z) - Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration [8.680322662037721]
We show how an object detection pipeline can be integrated into a design process for visual exploration.
We present the design and development of an app that enables exploration of an art museum's collection.
arXiv Detail & Related papers (2024-03-28T06:46:45Z) - Blind Dates: Examining the Expression of Temporality in Historical
Photographs [57.07335632641355]
We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model.
We use the textitDe Boer Scene Detection dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999.
Our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers.
arXiv Detail & Related papers (2023-10-10T13:51:24Z) - OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
Documents [122.55393759474181]
We introduce OBELICS, an open web-scale filtered dataset of interleaved image-text documents.
We describe the dataset creation process, present comprehensive filtering rules, and provide an analysis of the dataset's content.
We train vision and language models of 9 and 80 billion parameters named IDEFICS, and obtain competitive performance on different multimodal benchmarks.
arXiv Detail & Related papers (2023-06-21T14:01:01Z) - EDIS: Entity-Driven Image Search over Multimodal Web Content [95.40238328527931]
We introduce textbfEntity-textbfDriven textbfImage textbfSearch (EDIS), a dataset for cross-modal image search in the news domain.
EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
arXiv Detail & Related papers (2023-05-23T02:59:19Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Probabilistic Compositional Embeddings for Multimodal Image Retrieval [48.450232527041436]
We investigate a more challenging scenario for composing multiple multimodal queries in image retrieval.
Given an arbitrary number of query images and (or) texts, our goal is to retrieve target images containing the semantic concepts specified in multiple multimodal queries.
We propose a novel multimodal probabilistic composer (MPC) to learn an informative embedding that can flexibly encode the semantics of various queries.
arXiv Detail & Related papers (2022-04-12T14:45:37Z) - Automatic Image Content Extraction: Operationalizing Machine Learning in
Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives.
The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z) - Object Retrieval and Localization in Large Art Collections using Deep
Multi-Style Feature Fusion and Iterative Voting [10.807131260367298]
We introduce an algorithm that allows users to search for image regions containing specific motifs or objects.
Our region-based voting with GPU-accelerated approximate nearest-neighbour search allows us to find and localize even small motifs within an extensive dataset in a few seconds.
arXiv Detail & Related papers (2021-07-14T18:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.