Related papers: The AI-Augmented Research Process: A Historian's Perspective

The AI-Augmented Research Process: A Historian's Perspective

URL: http://arxiv.org/abs/2508.01779v1
Date: Sun, 03 Aug 2025 14:34:36 GMT
Title: The AI-Augmented Research Process: A Historian's Perspective
Authors: Christian Henriot,
Abstract summary: This paper presents a detailed case study of how artificial intelligence, especially large language models, can be integrated into historical research.<n>The workflow is divided into nine steps, covering the full research cycle from question formulation to dissemination and domains.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents a detailed case study of how artificial intelligence, especially large language models, can be integrated into historical research workflows. The workflow is divided into nine steps, covering the full research cycle from question formulation to dissemination and reproducibility, and includes two framing phases that address setup and documentation. Each research step is mapped across three operational domains: 1. LLM, referring to tasks delegated to language models; 2. Mind, referring to conceptual and interpretive contributions by the historian; and 3. Computational, referring to conventional programming-based methods like Python, R, Cytoscape, etc. The study emphasizes that LLMs are not replacements for domain expertise but can support and expand capacity of historians to process, verify, and interpret large corpora of texts. At the same time, it highlights the necessity of rigorous quality control, cross-checking outputs, and maintaining scholarly standards. Drawing from an in-depth study of three Shanghai merchants, the paper also proposes a structured workflow based on a real case study hat articulates the cognitive labor of the historian with both computational tools and generative AI. This paper makes both a methodological and epistemological contribution by showing how AI can be responsibly incorporated into historical research through transparent and reproducible workflows. It is intended as a practical guide and critical reflection for historians facing the increasingly complex landscape of AI-enhanced scholarship.

Related papers

KnowCoder-V2: Deep Knowledge Analysis [64.63893361811968]
We propose a textbfKnowledgeable textbfDeep textbfResearch (textbfKDR) framework that empowers deep research with deep knowledge analysis capability.<n>It introduces an independent knowledge organization phase to preprocess large-scale, domain-relevant data into systematic knowledge offline.<n>It then extends deep research with an additional kind of reasoning steps that perform complex knowledge computation in an online manner.
arXiv Detail & Related papers (2025-06-07T18:01:25Z)
DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.<n>With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.<n>We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
A Comprehensive Survey on Long Context Language Modeling [118.5540791080351]
Long Context Language Models (LCLMs) process and analyze extensive inputs in an effective and efficient way.<n>Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively.
arXiv Detail & Related papers (2025-03-20T17:06:28Z)
Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration [0.0]
This paper introduces Agentic Retrieval-Augmented Generation (Agentic RAG) as a method for topic modeling with LLMs.<n>It integrates three key components: (1) retrieval, enabling automatized access to external data beyond an LLM's pre-trained knowledge; (2) generation, leveraging LLM capabilities for text synthesis; and (3) agent-driven learning, iteratively refining retrieval and query formulation processes.<n>Our findings demonstrate that the approach is more efficient, interpretable and at the same time achieves higher reliability and validity in comparison to the standard machine learning approach.
arXiv Detail & Related papers (2025-02-28T11:25:11Z)
Survey on Vision-Language-Action Models [0.2636873872510828]
This work does not represent original research, but highlights how AI can help automate literature reviews.<n>Future research will focus on developing a structured framework for AI-assisted literature reviews.
arXiv Detail & Related papers (2025-02-07T11:56:46Z)
AAAR-1.0: Assessing AI's Potential to Assist Research [34.88341605349765]
We introduce AAAR-1.0, a benchmark dataset designed to evaluate large language models (LLMs) performance in three fundamental, expertise-intensive research tasks.<n> AAAR-1.0 differs from prior benchmarks in two key ways: first, it is explicitly research-oriented, with tasks requiring deep domain expertise; second, it is researcher-oriented, mirroring the primary activities that researchers engage in on a daily basis.
arXiv Detail & Related papers (2024-10-29T17:58:29Z)
ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [21.17856299966841]
This study introduces ResearchArena, a benchmark designed to evaluate large language models (LLMs) in conducting academic surveys.<n>To support these opportunities, we construct an environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z)
Artificial intelligence to automate the systematic review of scientific literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature. We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text. In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.