Ai2 Scholar QA: Organized Literature Synthesis with Attribution
- URL: http://arxiv.org/abs/2504.10861v1
- Date: Tue, 15 Apr 2025 04:48:18 GMT
- Title: Ai2 Scholar QA: Organized Literature Synthesis with Attribution
- Authors: Amanpreet Singh, Joseph Chee Chang, Chloe Anastasiades, Dany Haddad, Aakanksha Naik, Amber Tanaka, Angele Zamarron, Cecile Nguyen, Jena D. Hwang, Jason Dunkleberger, Matt Latzke, Smita Rao, Jaron Lochner, Rob Evans, Rodney Kinney, Daniel S. Weld, Doug Downey, Sergey Feldman,
- Abstract summary: Ai2 Scholar QA is a free online scientific question answering application.<n>We make our entire pipeline public: as a customizable open-source Python package and interactive web app.<n>In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.
- Score: 40.82558671117267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question answering application. To facilitate research, we make our entire pipeline public: as a customizable open-source Python package and interactive web app, along with paper indexes accessible through public APIs and downloadable datasets. We describe our system in detail and present experiments analyzing its key design decisions. In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.
Related papers
- PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.<n>The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.<n>We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z) - SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers [20.273439120429025]
SciDQA is a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles.
Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors.
Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials.
arXiv Detail & Related papers (2024-11-08T05:28:22Z) - EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems [103.91826112815384]
citation-based QA systems are suffering from two shortcomings.
They usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system.
We propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system.
arXiv Detail & Related papers (2024-06-14T19:40:38Z) - SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation [11.129800893611646]
SciQAG is a framework for automatically generating high-quality science question-answer pairs from a large corpus of scientific literature based on large language models (LLMs)
We construct a large-scale, high-quality, open-ended science QA dataset containing 188,042 QA pairs extracted from 22,743 scientific papers across 24 scientific domains.
We also introduce SciQAG-24D, a new benchmark task designed to evaluate the science question-answering ability of LLMs.
arXiv Detail & Related papers (2024-05-16T09:42:37Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - The Semantic Scholar Open Data Platform [92.2948743167744]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based
Textual Knowledge Source [2.348805691644086]
This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source.
From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
arXiv Detail & Related papers (2022-04-14T14:54:33Z) - Achieving Human Parity on Visual Question Answering [67.22500027651509]
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
This paper describes our recent research of AliceMind-MMU that obtains similar or even slightly better results than human beings does on VQA.
This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task.
arXiv Detail & Related papers (2021-11-17T04:25:11Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.