Patience is all you need! An agentic system for performing scientific literature review
- URL: http://arxiv.org/abs/2504.08752v1
- Date: Fri, 28 Mar 2025 08:08:46 GMT
- Title: Patience is all you need! An agentic system for performing scientific literature review
- Authors: David Brett, Anniek Myatt,
- Abstract summary: Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines.<n>We have built an LLM-based system that performs such search and distillation of information encapsulated in scientific literature.<n>We evaluate our keyword based search and information distillation system against a set of biology related questions from previously released literature benchmarks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines. The models on their own have already shown promise for answering basic questions, however fail quickly where expert domain knowledge is required or the question is nuanced. Scientific research often involves searching for relevant literature, distilling pertinent information from that literature and analysing how the findings support or contradict one another. The information is often encapsulated in the full text body of research articles, rather than just in the abstracts. Statements within these articles frequently require the wider article context to be fully understood. We have built an LLM-based system that performs such search and distillation of information encapsulated in scientific literature, and we evaluate our keyword based search and information distillation system against a set of biology related questions from previously released literature benchmarks. We demonstrate sparse retrieval methods exhibit results close to state of the art without the need for dense retrieval, with its associated infrastructure and complexity overhead. We also show how to increase the coverage of relevant documents for literature review generation.
Related papers
- SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers [20.273439120429025]
SciDQA is a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles.
Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors.
Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials.
arXiv Detail & Related papers (2024-11-08T05:28:22Z) - LitSearch: A Retrieval Benchmark for Scientific Literature Search [48.593157851171526]
We introduce LitSearch, a retrieval benchmark comprising 597 realistic literature search queries about recent ML and NLP papers.
All LitSearch questions were manually examined or edited by experts to ensure high quality.
We find a significant performance gap between BM25 and state-of-the-art dense retrievers, with a 24.8% absolute difference in recall@5.
arXiv Detail & Related papers (2024-07-10T18:00:03Z) - Efficient Systematic Reviews: Literature Filtering with Transformers & Transfer Learning [0.0]
It comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic.
We develop a method for building a general-purpose filtering system that matches a research question, posed as a natural language description of the required content.
Our results demonstrate that transformer models, pre-trained on biomedical literature, and then fine tuned for the specific task, offer a promising solution to this problem.
arXiv Detail & Related papers (2024-05-30T02:55:49Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.
This paper presents a thorough analysis of these literature reviews within the PAMI field.
We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Bridging Research and Readers: A Multi-Modal Automated Academic Papers
Interpretation System [47.13932723910289]
We introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages.
It employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately.
It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section.
It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs.
arXiv Detail & Related papers (2024-01-17T11:50:53Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles [0.0]
We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary.
We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
arXiv Detail & Related papers (2022-12-11T06:49:29Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.