Related papers: PeerQA: A Scientific Question Answering Dataset from Peer Reviews

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

URL: http://arxiv.org/abs/2502.13668v1
Date: Wed, 19 Feb 2025 12:24:46 GMT
Title: PeerQA: A Scientific Question Answering Dataset from Peer Reviews
Authors: Tim Baumgärtner, Ted Briscoe, Iryna Gurevych,
Abstract summary: We present PeerQA, a real-world, scientific, document-level Question Answering dataset.<n>The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.<n>We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
Score: 51.95579001315713
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens. Our code and data is available at https://github.com/UKPLab/peerqa.

Related papers

Inferential Question Answering [67.54465021408724]
We introduce Inferential QA -- a new task that challenges models to infer answers from answer-supporting passages which provide only clues.<n>To study this problem, we construct QUIT (QUestions requiring Inference from Texts) dataset, comprising 7,401 questions and 2.4M passages.<n>We show that methods effective on traditional QA tasks struggle in inferential QA: retrievers underperform, rerankers offer limited gains, and fine-tuning provides inconsistent improvements.
arXiv Detail & Related papers (2026-02-01T14:02:43Z)
Ai2 Scholar QA: Organized Literature Synthesis with Attribution [40.82558671117267]
Ai2 Scholar QA is a free online scientific question answering application. We make our entire pipeline public: as a customizable open-source Python package and interactive web app. In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.
arXiv Detail & Related papers (2025-04-15T04:48:18Z)
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers [20.273439120429025]
SciDQA is a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials.
arXiv Detail & Related papers (2024-11-08T05:28:22Z)
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.<n>We employ automatic and manual curation to create the dataset.<n> SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z)
QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball) QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z)
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge. Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z)
Improving Unsupervised Question Answering via Summarization-Informed Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair. We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling. The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z)
QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.