Kongzi: A Historical Large Language Model with Fact Enhancement
- URL: http://arxiv.org/abs/2504.09488v1
- Date: Sun, 13 Apr 2025 09:01:05 GMT
- Title: Kongzi: A Historical Large Language Model with Fact Enhancement
- Authors: Jiashu Yang, Ningning Wang, Yian Zhao, Chaoran Feng, Junjia Du, Hao Pang, Zhirui Fang, Xuxin Cheng,
- Abstract summary: Kongzi is a large language model specifically designed for historical analysis.<n>Through the integration of curated, high-quality historical data and a novel fact-reinforcement learning strategy, Kongzi demonstrates strong factual alignment and sophisticated reasoning depth.
- Score: 4.687722574822698
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The capabilities of the latest large language models (LLMs) have been extended from pure natural language understanding to complex reasoning tasks. However, current reasoning models often exhibit factual inaccuracies in longer reasoning chains, which poses challenges for historical reasoning and limits the potential of LLMs in complex, knowledge-intensive tasks. Historical studies require not only the accurate presentation of factual information but also the ability to establish cross-temporal correlations and derive coherent conclusions from fragmentary and often ambiguous sources. To address these challenges, we propose Kongzi, a large language model specifically designed for historical analysis. Through the integration of curated, high-quality historical data and a novel fact-reinforcement learning strategy, Kongzi demonstrates strong factual alignment and sophisticated reasoning depth. Extensive experiments on tasks such as historical question answering and narrative generation demonstrate that Kongzi outperforms existing models in both factual accuracy and reasoning depth. By effectively addressing the unique challenges inherent in historical texts, Kongzi sets a new standard for the development of accurate and reliable LLMs in professional domains.
Related papers
- A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning [24.377657990045503]
We propose a Deep Generative Adaptive Replay (DGAR) method, which can generate and adaptively replay historical entity distribution representations.<n> Experimental results demonstrate that DGAR significantly outperforms baselines in reasoning and mitigating forgetting.
arXiv Detail & Related papers (2025-06-04T15:44:50Z) - On Path to Multimodal Historical Reasoning: HistBench and HistAgent [68.02249599465337]
HistBench is a new benchmark of 414 high-quality questions designed to evaluate AI's capacity for historical reasoning.<n>Tasks span a wide range of historical problems-from factual retrieval based on primary sources to interpretive analysis of manuscripts and images.<n>We present HistAgent, a history-specific agent equipped with carefully designed tools for OCR, translation, archival search, and image understanding in History.
arXiv Detail & Related papers (2025-05-26T17:22:20Z) - Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan [0.1979158763744267]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing.<n>This study examines the performance of open-source LLMs in part-of-speech (POS) tagging for Old Occitan.
arXiv Detail & Related papers (2025-03-10T20:16:01Z) - LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data [19.79929012055293]
LongFaith is a novel pipeline for synthesizing faithful long-context reasoning instruction datasets.<n>By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains.
arXiv Detail & Related papers (2025-02-18T06:40:23Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives.
We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels [86.93099925711388]
We propose textbfDetectiveQA, a dataset specifically designed for narrative reasoning within long contexts.
We leverage detective novels, averaging over 100k tokens, to create a dataset containing 1200 human-annotated questions in both Chinese and English.
arXiv Detail & Related papers (2024-09-04T06:28:22Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens [63.7488938083696]
NovelQA is a benchmark designed to test the capabilities of Large Language Models with extended texts.
This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types.
Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance.
arXiv Detail & Related papers (2024-03-18T17:32:32Z) - Multilingual Event Extraction from Historical Newspaper Adverts [42.987470570997694]
This paper focuses on the under-explored task of event extraction from a novel domain of historical texts.
We introduce a new multilingual dataset in English, French, and Dutch composed of newspaper ads from the early modern colonial period.
We find that even with scarce annotated data, it is possible to achieve surprisingly good results by formulating the problem as an extractive QA task.
arXiv Detail & Related papers (2023-05-18T12:40:41Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Restoring and Mining the Records of the Joseon Dynasty via Neural
Language Modeling and Machine Translation [20.497110880878544]
We present a multi-task learning approach to restore and translate historical documents based on a self-attention mechanism.
Our approach significantly improves the accuracy of the translation task than baselines without multi-task learning.
arXiv Detail & Related papers (2021-04-13T06:40:25Z) - A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation [98.25464306634758]
We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories.
We employ multi-task learning which combines a discriminative objective to distinguish true and fake stories.
Our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.
arXiv Detail & Related papers (2020-01-15T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.