Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR
- URL: http://arxiv.org/abs/2601.11874v1
- Date: Sat, 17 Jan 2026 01:54:55 GMT
- Title: Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR
- Authors: Suchana Datta, Dwaipayan Roy, Derek Greene, Gerardine Meaney, Karen Wade, Philipp Mayr,
- Abstract summary: This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge.<n>Using the British Library BL19 digital collection, we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction.
- Score: 6.217528366941651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.
Related papers
- Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates [0.3033221007650832]
This paper introduces ATR4CH, a systematic five-step methodology for Large Language Model-based Knowledge Extraction.<n>We validate the methodology through a case study on authenticity assessment debates.<n>ATR4CH enables Cultural Heritage institutions to systematically convert textual knowledge into queryable Knowledge Graphs.
arXiv Detail & Related papers (2025-11-13T14:29:51Z) - Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index [0.0]
This research presents a Retrieval-Augmented Generation framework for art studies, focusing on the Getty Provenance Index.<n>Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects.<n>Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures.
arXiv Detail & Related papers (2025-08-26T14:58:09Z) - Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering [51.7493726399073]
We present a discourse-aware hierarchical framework to enhance long document question answering.<n>The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts [65.90535970515266]
TimeTravel is a benchmark of 10,250 expert-verified samples spanning 266 distinct cultures across 10 major historical regions.<n>TimeTravel is designed for AI-driven analysis of manuscripts, artworks, inscriptions, and archaeological discoveries.<n>We evaluate contemporary AI models on TimeTravel, highlighting their strengths and identifying areas for improvement.
arXiv Detail & Related papers (2025-02-20T18:59:51Z) - GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability.
Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI.
We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Curatr: A Platform for Semantic Analysis and Curation of Historical
Literary Texts [5.075506385456811]
This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search.
The platform combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons.
arXiv Detail & Related papers (2023-06-13T15:15:31Z) - Embedding Knowledge for Document Summarization: A Survey [66.76415502727802]
Previous works proved that knowledge-embedded document summarizers excel at generating superior digests.
We propose novel to recapitulate knowledge and knowledge embeddings under the document summarization view.
arXiv Detail & Related papers (2022-04-24T04:36:07Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z) - Ontologies in CLARIAH: Towards Interoperability in History, Language and
Media [0.05277024349608833]
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions.
The FAIR principles provide a framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared.
We describe the tools developed and integrated in the Dutch national project CLARIAH to address these issues.
arXiv Detail & Related papers (2020-04-06T17:38:47Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.