A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions
- URL: http://arxiv.org/abs/2602.19133v1
- Date: Sun, 22 Feb 2026 11:29:03 GMT
- Title: A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions
- Authors: Stefanie Schneider, Miriam Göldl, Julian Stalter, Ricarda Vollmer,
- Abstract summary: FRAME is a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE)<n> Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases.<n>The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and metadata, and can be used to benchmark and fine-tune NER and RE systems.
- Score: 0.379152625956354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces FRAME (Fine-grained Recognition of Art-historical Metadata and Entities), a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE). Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases, then filtered to ensure that each text focuses on a single artwork and contains explicit statements about its material, composition, or iconography. FRAME provides stand-off annotations in three layers: a metadata layer for object-level properties, a content layer for depicted subjects and motifs, and a co-reference layer linking repeated mentions. Across layers, entity spans are labeled with 37 types and connected by typed RE links between mentions. Entity types are aligned with Wikidata to support Named Entity Linking (NEL) and downstream knowledge-graph construction. The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and bibliographic metadata, and can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Language Models (LLMs).
Related papers
- A Sketch+Text Composed Image Retrieval Dataset for Thangka [14.600552992453977]
Composed Image Retrieval (CIR) enables image retrieval by combining multiple query modalities.<n>CIRThan is a sketch+text Composed Image Retrieval dataset for Thangka imagery.
arXiv Detail & Related papers (2026-02-09T09:14:29Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections [6.723689308768857]
The EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure based on the Art & Architecture Thesaurus (AAT)
Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the utility of the dataset in improving multi-label classification tools.
arXiv Detail & Related papers (2024-06-04T14:57:56Z) - Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval [53.89454443114146]
We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets.
Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space.
We propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs)
KEDs implicitly models the attributes of the reference images by incorporating a database.
arXiv Detail & Related papers (2024-03-24T04:23:56Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics [3.89394670917253]
We describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata.
We aim to create a more uniform set of labels to serve as a "bridge" in the combined dataset.
Visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata.
arXiv Detail & Related papers (2022-08-20T10:59:33Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - Document-level Relation Extraction as Semantic Segmentation [38.614931876015625]
Document-level relation extraction aims to extract relations among multiple entity pairs from a document.
This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information.
We propose a Document U-shaped Network for document-level relation extraction.
arXiv Detail & Related papers (2021-06-07T13:44:44Z) - Learning to Infer Unseen Single-/Multi-Attribute-Object Compositions with Graph Networks [47.43595942156663]
In this paper, we propose an attribute-object semantic association graph model to learn the complex relations.<n>With nodes representing attributes and objects, the graph can be constructed flexibly, which realizes both single- and multi-attribute-object composition recognition.
arXiv Detail & Related papers (2020-10-27T14:57:35Z) - PhraseCut: Language-based Image Segmentation in the Wild [62.643450401286]
We consider the problem of segmenting image regions given a natural language phrase.
Our dataset is collected on top of the Visual Genome dataset.
Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art.
arXiv Detail & Related papers (2020-08-03T20:58:53Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.