Related papers: EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

URL: http://arxiv.org/abs/2406.02380v1
Date: Tue, 4 Jun 2024 14:57:56 GMT
Title: EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections
Authors: Francesc Net, Marc Folia, Pep Casals, Andrew D. Bagdanov, Lluis Gomez,
Abstract summary: The EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure based on the Art & Architecture Thesaurus (AAT) Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the utility of the dataset in improving multi-label classification tools.
Score: 6.723689308768857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we address the challenges of automatic metadata annotation in the domain of Galleries, Libraries, Archives, and Museums (GLAMs) by introducing a novel dataset, EUFCC340K, collected from the Europeana portal. Comprising over 340,000 images, the EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure based on the Art & Architecture Thesaurus (AAT). We developed several baseline models, incorporating multiple heads on a ConvNeXT backbone for multi-label image tagging on these facets, and fine-tuning a CLIP model with our image text pairs. Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the utility of the dataset in improving multi-label classification tools that have the potential to alleviate cataloging tasks in the cultural heritage sector.

Related papers

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions [0.379152625956354]
FRAME is a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE)<n> Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases.<n>The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and metadata, and can be used to benchmark and fine-tune NER and RE systems.
arXiv Detail & Related papers (2026-02-22T11:29:03Z)
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder. We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z)
Introducing Three New Benchmark Datasets for Hierarchical Text Classification [0.0]
We introduce three new HTC benchmark datasets in the domain of research publications. We propose an approach which combines their classifications to improve the reliability and robustness of the dataset. We evaluate the three created datasets with a clustering-based analysis and show that our proposed approach results in a higher quality dataset.
arXiv Detail & Related papers (2024-11-28T13:06:48Z)
EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections [0.0]
EUFCC-CIR is a dataset designed for Composed Image Retrieval (CIR) within Galleries, Libraries, Archives, and Museums (GLAM) collections. Our dataset is built on top of the EUFCC-340K image labeling dataset and contains over 180K annotated CIR triplets.
arXiv Detail & Related papers (2024-10-02T13:26:53Z)
Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery [1.6492989697868894]
We show the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information. We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet.
arXiv Detail & Related papers (2024-09-10T16:15:01Z)
Mixed-Query Transformer: A Unified Image Segmentation Architecture [57.32212654642384]
Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. We introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single set of weights.
arXiv Detail & Related papers (2024-04-06T01:54:17Z)
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection. We abstract over arbitrary header paraphrases, and ground each topic to respective document locations. We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z)
A Multi-Modal Multilingual Benchmark for Document Image Classification [21.7518357653137]
We introduce two newly curated multilingual datasets WIKI-DOC and MULTIEUR-DOCLEX. We study popular visually-rich document understanding or Document AI models in previously untested setting in document image classification. Experimental results show limitations of multilingual Document AI models on cross-lingual transfer across typologically distant languages.
arXiv Detail & Related papers (2023-10-25T04:35:06Z)
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents [122.55393759474181]
We introduce OBELICS, an open web-scale filtered dataset of interleaved image-text documents. We describe the dataset creation process, present comprehensive filtering rules, and provide an analysis of the dataset's content. We train vision and language models of 9 and 80 billion parameters named IDEFICS, and obtain competitive performance on different multimodal benchmarks.
arXiv Detail & Related papers (2023-06-21T14:01:01Z)
Multi-Modal Classifiers for Open-Vocabulary Object Detection [104.77331131447541]
The goal of this paper is open-vocabulary object detection (OVOD) We adopt a standard two-stage object detector architecture. We explore three ways via: language descriptions, image exemplars, or a combination of the two.
arXiv Detail & Related papers (2023-06-08T18:31:56Z)
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation [42.35572014527354]
The AToMiC dataset is designed to advance research in image/text cross-modal retrieval. We leverage hierarchical structures and diverse domains of texts, styles, and types of images, as well as large-scale image-document associations embedded in Wikipedia. AToMiC offers a testbed for scalable, diverse, and reproducible multimedia retrieval research.
arXiv Detail & Related papers (2023-04-04T17:11:34Z)
VRDU: A Benchmark for Visually-rich Document Understanding [22.040372755535767]
We identify the desiderata for a more comprehensive benchmark and propose one we call Visually Rich Document Understanding (VRDU) VRDU contains two datasets that represent several challenges: rich schema including diverse data types as well as hierarchical entities, complex templates including tables and multi-column layouts, and diversity of different layouts (templates) within a single document type. We design few-shot and conventional experiment settings along with a carefully designed matching algorithm to evaluate extraction results.
arXiv Detail & Related papers (2022-11-15T03:17:07Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets. We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy. Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.