Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
- URL: http://arxiv.org/abs/2508.19093v1
- Date: Tue, 26 Aug 2025 14:58:09 GMT
- Title: Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
- Authors: Mathew Henrickson,
- Abstract summary: This research presents a Retrieval-Augmented Generation framework for art studies, focusing on the Getty Provenance Index.<n>Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects.<n>Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This research presents a Retrieval-Augmented Generation (RAG) framework for art provenance studies, focusing on the Getty Provenance Index. Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects. The process is complicated by fragmented, multilingual archival data that hinders efficient retrieval. Current search portals require precise metadata, limiting exploratory searches. Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures. We assess RAG's capability to retrieve and summarize auction records using a 10,000-record sample from the Getty Provenance Index - German Sales. The results show this approach provides a scalable solution for navigating art market archives, offering a practical tool for historians and cultural heritage professionals conducting historically sensitive research.
Related papers
- Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR [6.217528366941651]
This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge.<n>Using the British Library BL19 digital collection, we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction.
arXiv Detail & Related papers (2026-01-17T01:54:55Z) - On Path to Multimodal Historical Reasoning: HistBench and HistAgent [68.02249599465337]
HistBench is a new benchmark of 414 high-quality questions designed to evaluate AI's capacity for historical reasoning.<n>Tasks span a wide range of historical problems-from factual retrieval based on primary sources to interpretive analysis of manuscripts and images.<n>We present HistAgent, a history-specific agent equipped with carefully designed tools for OCR, translation, archival search, and image understanding in History.
arXiv Detail & Related papers (2025-05-26T17:22:20Z) - Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi's Zibaldone [4.795582035438343]
There is an urgent need of computational techniques able to adapt to the challenges of historical texts.<n>The rise of large language models (LLMs) has revolutionized natural language processing.<n>No thorough evaluation has been proposed for Italian texts.
arXiv Detail & Related papers (2025-05-26T15:16:48Z) - Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts [65.90535970515266]
TimeTravel is a benchmark of 10,250 expert-verified samples spanning 266 distinct cultures across 10 major historical regions.<n>TimeTravel is designed for AI-driven analysis of manuscripts, artworks, inscriptions, and archaeological discoveries.<n>We evaluate contemporary AI models on TimeTravel, highlighting their strengths and identifying areas for improvement.
arXiv Detail & Related papers (2025-02-20T18:59:51Z) - CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.<n>Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.<n>Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - The Role of Generative Systems in Historical Photography Management: A Case Study on Catalan Archives [0.24578723416255752]
The use of image analysis in automated photography management is an increasing trend in heritage institutions.
The primary objective of this research is to study the quantitative contribution of generative systems in the description of historical sources.
This is done by contextualizing the task of captioning historical photographs from the Catalan archives as a case study.
arXiv Detail & Related papers (2024-09-05T21:08:25Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - ScrollTimes: Tracing the Provenance of Paintings as a Window into
History [35.605930297790465]
The study of cultural artifact provenance, tracing ownership and preservation, holds significant importance in archaeology and art history.
In collaboration with art historians, we examined the handscroll, a traditional Chinese painting form that provides a rich source of historical data.
We present a three-tiered methodology encompassing artifact, contextual, and provenance levels, designed to create a "Biography" for handscroll.
arXiv Detail & Related papers (2023-06-15T03:38:09Z) - A Survey on Retrieval-Augmented Text Generation [53.04991859796971]
Retrieval-augmented text generation has remarkable advantages and has achieved state-of-the-art performance in many NLP tasks.
It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks.
arXiv Detail & Related papers (2022-02-02T16:18:41Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.