CORE-GPT: Combining Open Access research and large language models for
credible, trustworthy question answering
- URL: http://arxiv.org/abs/2307.04683v1
- Date: Thu, 6 Jul 2023 13:41:36 GMT
- Title: CORE-GPT: Combining Open Access research and large language models for
credible, trustworthy question answering
- Authors: David Pride, Matteo Cancellieri and Petr Knoth
- Abstract summary: We present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE.
We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text.
We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers.
- Score: 0.6537685198688536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present CORE-GPT, a novel question-answering platform that
combines GPT-based language models and more than 32 million full-text open
access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4
cannot be relied upon to provide references or citations for generated text. We
then introduce CORE-GPT which delivers evidence-based answers to questions,
along with citations and links to the cited papers, greatly increasing the
trustworthiness of the answers and reducing the risk of hallucinations.
CORE-GPT's performance was evaluated on a dataset of 100 questions covering the
top 20 scientific domains in CORE, resulting in 100 answers and links to 500
relevant articles. The quality of the provided answers and and relevance of the
links were assessed by two annotators. Our results demonstrate that CORE-GPT
can produce comprehensive and trustworthy answers across the majority of
scientific domains, complete with links to genuine, relevant scientific
articles.
Related papers
- PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.
The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.
We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z) - OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs [151.79792315631965]
We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and citation-backed responses.
On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model.
OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs.
arXiv Detail & Related papers (2024-11-21T15:07:42Z) - ChatGPT Application In Summarizing An Evolution Of Deep Learning
Techniques In Imaging: A Qualitative Study [0.0]
ChatGPT 3.5 exhibits the capacity to condense the content of up to 3000 tokens into a single page.
We selected seven scientific articles and employed the publicly available ChatGPT service to generate summaries of these articles.
There was a slight diminishment in the technical depth of the summaries as opposed to the original articles.
arXiv Detail & Related papers (2023-11-26T23:22:37Z) - Can large language models provide useful feedback on research papers? A
large-scale empirical analysis [38.905758846360435]
High-quality peer reviews are increasingly difficult to obtain.
With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback.
We created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers.
arXiv Detail & Related papers (2023-10-03T04:14:17Z) - ChatGPT Hallucinates when Attributing Answers [27.63520311803786]
We investigate how different prompts impact answers and evidence.
We find that ChatGPT provides correct or partially correct answers in about half of the cases.
But its suggested references only exist 14% of the times.
arXiv Detail & Related papers (2023-09-17T23:49:12Z) - Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation [55.00687185394986]
We propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews.
We introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences.
Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions.
arXiv Detail & Related papers (2023-05-24T02:33:35Z) - WebCPM: Interactive Web Search for Chinese Long-form Question Answering [104.676752359777]
Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses.
We introduce WebCPM, the first Chinese LFQA dataset.
We collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions.
arXiv Detail & Related papers (2023-05-11T14:47:29Z) - ChatGPT cites the most-cited articles and journals, relying solely on
Google Scholar's citation counts. As a result, AI may amplify the Matthew
Effect in environmental science [0.0]
ChatGPT tends to cite highly-cited publications in environmental science.
Google Scholar citations play a significant role as a predictor for mentioning a study in GPT-generated content.
arXiv Detail & Related papers (2023-04-13T19:29:49Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Science Checker: Extractive-Boolean Question Answering For Scientific
Fact Checking [0.0]
We propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles.
With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%.
arXiv Detail & Related papers (2022-04-26T12:35:23Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers.
We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews.
Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.