Related papers: ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation

ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation

URL: http://arxiv.org/abs/2511.03656v1
Date: Wed, 05 Nov 2025 17:13:14 GMT
Title: ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation
Authors: Jing Gao, Shutiao Luo, Yumeng Liu, Yuanming Li, Hongji Zeng,
Abstract summary: ChiMDQA encompasses long-form documents from six distinct fields, consisting of 6,068 rigorously curated, high-quality question-answer pairs.<n>The dataset guarantees both diversity and high quality, rendering it applicable to various NLP tasks such as document comprehension, knowledge extraction, and intelligent QA systems.
Score: 12.784082281917003
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the rapid advancement of natural language processing (NLP) technologies, the demand for high-quality Chinese document question-answering datasets is steadily growing. To address this issue, we present the Chinese Multi-Document Question Answering Dataset(ChiMDQA), specifically designed for downstream business scenarios across prevalent domains including academic, education, finance, law, medical treatment, and news. ChiMDQA encompasses long-form documents from six distinct fields, consisting of 6,068 rigorously curated, high-quality question-answer (QA) pairs further classified into ten fine-grained categories. Through meticulous document screening and a systematic question-design methodology, the dataset guarantees both diversity and high quality, rendering it applicable to various NLP tasks such as document comprehension, knowledge extraction, and intelligent QA systems. Additionally, this paper offers a comprehensive overview of the dataset's design objectives, construction methodologies, and fine-grained evaluation system, supplying a substantial foundation for future research and practical applications in Chinese QA. The code and data are available at: https://anonymous.4open.science/r/Foxit-CHiMDQA/.

Related papers

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive [50.468138755368805]
Opioid crisis represents a significant moment in public health.<n>Data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA)<n>In this paper, we tackle this challenge by organizing the original dataset according to document attributes.
arXiv Detail & Related papers (2025-11-13T03:27:32Z)
AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings [8.22650587342049]
Document Visual Question Answering (Document VQA) faces significant challenges when processing long documents in low-resource environments.<n>This paper presents AdaDocVQA, a unified adaptive framework addressing these challenges through three core innovations.<n> Experiments on Japanese document VQA benchmarks demonstrate substantial improvements with 83.04% accuracy on Yes/No questions.
arXiv Detail & Related papers (2025-08-19T08:12:45Z)
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.<n>The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.<n>We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z)
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers [20.273439120429025]
SciDQA is a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials.
arXiv Detail & Related papers (2024-11-08T05:28:22Z)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA) Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents. We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z)
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. High-quality datasets are used to train models on realistic scenarios. Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z)
HiQA: A Hierarchical Contextual Augmentation RAG for Multi-Documents QA [13.000411428297813]
We present HiQA, an advanced multi-document question-answering (MDQA) framework that integrates cascading metadata into content and a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA.
arXiv Detail & Related papers (2024-02-01T02:24:15Z)
Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language. We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs. We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z)
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques. We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture. We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.