Related papers: Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications

Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications

URL: http://arxiv.org/abs/2505.23543v1
Date: Thu, 29 May 2025 15:22:18 GMT
Title: Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications
Authors: Jan Ignatowicz, Krzysztof Kutt, Grzegorz J. Nalepa,
Abstract summary: We present the Metadata Enrichment Model (MEM), a conceptual framework designed to enrich metadata for digitized collections.<n>MEM combines fine-tuned computer vision models, large language models and structured knowledge graphs.<n>We release a dataset of digitized incunabula from the Jagiellonian Digital Library.
Score: 8.732274235941974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The digitization of cultural heritage collections has opened new directions for research, yet the lack of enriched metadata poses a substantial challenge to accessibility, interoperability, and cross-institutional collaboration. In several past years neural networks models such as YOLOv11 and Detectron2 have revolutionized visual data analysis, but their application to domain-specific cultural artifacts - such as manuscripts and incunabula - remains limited by the absence of methodologies that address structural feature extraction and semantic interoperability. In this position paper, we argue, that the integration of neural networks with semantic technologies represents a paradigm shift in cultural heritage digitization processes. We present the Metadata Enrichment Model (MEM), a conceptual framework designed to enrich metadata for digitized collections by combining fine-tuned computer vision models, large language models (LLMs) and structured knowledge graphs. The Multilayer Vision Mechanism (MVM) appears as the key innovation of MEM. This iterative process improves visual analysis by dynamically detecting nested features, such as text within seals or images within stamps. To expose MEM's potential, we apply it to a dataset of digitized incunabula from the Jagiellonian Digital Library and release a manually annotated dataset of 105 manuscript pages. We examine the practical challenges of MEM's usage in real-world GLAM institutions, including the need for domain-specific fine-tuning, the adjustment of enriched metadata with Linked Data standards and computational costs. We present MEM as a flexible and extensible methodology. This paper contributes to the discussion on how artificial intelligence and semantic web technologies can advance cultural heritage research, and also use these technologies in practice.

Related papers

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement [25.08967298618286]
Multimodal Large Language Models (MLLMs) are transforming chart information fusion.<n>This survey aims to equip researchers and practitioners with a structured understanding of how MLLMs are transforming chart information fusion.
arXiv Detail & Related papers (2026-02-08T12:59:50Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability [0.0]
MatterVial is an innovative hybrid framework for feature-based machine learning in materials science.<n>Our approach combines the chemical transparency of traditional feature-based models with the predictive power of deep learning architectures.<n>An integrated interpretability module, employing surrogate models and symbolic regression, decodes the latent GNN-derived descriptors into explicit, physically meaningful formulas.
arXiv Detail & Related papers (2025-09-02T16:45:02Z)
MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection [55.702662643521265]
We propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to explore the semantic interaction capabilities of multimodal data.<n> Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods.
arXiv Detail & Related papers (2025-08-03T02:50:08Z)
Attentional Graph Meta-Learning for Indoor Localization Using Extremely Sparse Fingerprints [17.159049478569173]
Fingerprint-based indoor localization is often labor-intensive due to the need for dense grids and repeated measurements across time and space.<n>Existing benchmark methods primarily rely on the measured fingerprints, while neglecting valuable spatial and environmental characteristics.<n>We propose a systematic integration of an Attentional Graph Neural Network (AGNN) model, capable of learning spatial adjacency relationships and aggregating information from neighboring fingerprints.
arXiv Detail & Related papers (2025-04-07T08:37:18Z)
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation [58.91579272882073]
This paper introduces a novel benchmark dataset, called textbfREO-Instruct to unify regression and generation tasks specifically for the Earth Observation domain.<n>We develop textbfREO-VLM, a groundbreaking model that seamlessly integrates regression capabilities with traditional generative functions.
arXiv Detail & Related papers (2024-12-21T11:17:15Z)
Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts [0.8245350546263803]
We propose a novel approach to synthetic document layout generation using Graph Neural Networks (GNNs)<n>By representing document elements as nodes in a graph, GNNs are trained to generate realistic and diverse document layouts.<n>Our experimental results show that graph-augmented document layouts outperform existing augmentation techniques.
arXiv Detail & Related papers (2024-11-27T21:15:02Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Deep Learning and Machine Learning -- Object Detection and Semantic Segmentation: From Theory to Applications [17.571124565519263]
In-depth exploration of object detection and semantic segmentation is provided.<n>State-of-the-art advancements in machine learning and deep learning are reviewed.<n>Analysis of big data processing is presented.
arXiv Detail & Related papers (2024-10-21T02:10:49Z)
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models [1.3927943269211591]
This paper experiments with leveraging in-context learning capabilities of Large Language Models to perform data annotation. We show that by using a few-shot learning strategy with structured prompts and only minimal expert annotation the presented approach can potentially support domain adaptation of a science KG generation model.
arXiv Detail & Related papers (2024-08-05T11:06:36Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z)
Graph Foundation Models: Concepts, Opportunities and Challenges [66.37994863159861]
Foundation models have emerged as critical components in a variety of artificial intelligence applications.<n>The capabilities of foundation models in generalization and adaptation motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm.<n>This article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies.
arXiv Detail & Related papers (2023-10-18T09:31:21Z)
Heri-Graphs: A Workflow of Creating Datasets for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media [7.318997639507268]
Values (why to conserve) and Attributes (what to conserve) are essential concepts of cultural heritage. Recent studies have been using social media to map values and attributes conveyed by public to cultural heritage. This study presents a methodological workflow for constructing such multi-modal datasets using posts and images on Flickr.
arXiv Detail & Related papers (2022-05-16T09:45:45Z)
Neural Networks for Semantic Gaze Analysis in XR Settings [0.0]
We present a novel approach which minimizes time and information necessary to annotate volumes of interest. We train convolutional neural networks (CNNs) on synthetic data sets derived from virtual models using image augmentation techniques. We evaluate our method in real and virtual environments, showing that the method can compete with state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-18T18:05:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.