Benchmarking Retrieval-Augmented Generation for Chemistry
- URL: http://arxiv.org/abs/2505.07671v1
- Date: Mon, 12 May 2025 15:34:45 GMT
- Title: Benchmarking Retrieval-Augmented Generation for Chemistry
- Authors: Xianrui Zhong, Bowen Jin, Siru Ouyang, Yanzhen Shen, Qiao Jin, Yin Fang, Zhiyong Lu, Jiawei Han,
- Abstract summary: Retrieval-augmented generation is a framework for enhancing large language models with external knowledge.<n>ChemRAG-Bench is a benchmark designed to assess the effectiveness of RAG across a diverse set of chemistry-related tasks.<n>ChemRAG-Toolkit is a modular toolkit that supports five retrieval algorithms and eight LLMs.
- Score: 28.592844362931853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) has emerged as a powerful framework for enhancing large language models (LLMs) with external knowledge, particularly in scientific domains that demand specialized and dynamic information. Despite its promise, the application of RAG in the chemistry domain remains underexplored, primarily due to the lack of high-quality, domain-specific corpora and well-curated evaluation benchmarks. In this work, we introduce ChemRAG-Bench, a comprehensive benchmark designed to systematically assess the effectiveness of RAG across a diverse set of chemistry-related tasks. The accompanying chemistry corpus integrates heterogeneous knowledge sources, including scientific literature, the PubChem database, PubMed abstracts, textbooks, and Wikipedia entries. In addition, we present ChemRAG-Toolkit, a modular and extensible RAG toolkit that supports five retrieval algorithms and eight LLMs. Using ChemRAG-Toolkit, we demonstrate that RAG yields a substantial performance gain -- achieving an average relative improvement of 17.4% over direct inference methods. We further conduct in-depth analyses on retriever architectures, corpus selection, and the number of retrieved passages, culminating in practical recommendations to guide future research and deployment of RAG systems in the chemistry domain. The code and data is available at https://chemrag.github.io.
Related papers
- Chunk Twice, Embed Once: A Systematic Study of Segmentation and Representation Trade-offs in Chemistry-Aware Retrieval-Augmented Generation [0.0]
Retrieval-Augmented Generation systems are increasingly vital for navigating the ever-expanding body of scientific literature.<n>This study presents the first large-scale, systematic evaluation of chunking strategies and embedding models tailored to chemistry-focused RAG systems.
arXiv Detail & Related papers (2025-06-13T07:44:53Z) - KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs [66.35046942874737]
KG-Infused RAG is a framework that integrates KGs into RAG systems to implement spreading activation.<n> KG-Infused RAG retrieves KG facts, expands the query accordingly, and enhances generation by combining corpus passages with structured facts.
arXiv Detail & Related papers (2025-06-11T09:20:02Z) - CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning [12.745398618084474]
Large language models (LLMs) have recently demonstrated promising capabilities in chemistry tasks.<n>We propose an LLM-based agent that integrates 137 external chemical tools created ranging from basic information retrieval to complex reaction predictions.
arXiv Detail & Related papers (2025-06-09T08:41:39Z) - ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints [0.0]
ChemRxivQuest is a curated dataset of 970 high-quality question-answer (QA) pairs from 155 ChemRxiv preprints across 17 subfields of chemistry.<n>Each QA pair is explicitly linked to its source text segment to ensure traceability and contextual accuracy.<n>ChemRxivQuest was constructed using an automated pipeline that combines optical character recognition (OCR), GPT-4o-based QA generation, and a fuzzy matching technique for answer verification.
arXiv Detail & Related papers (2025-05-08T13:26:33Z) - HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights [72.82973609312178]
HiPerRAG is a workflow to index and retrieve knowledge from more than 3.6 million scientific articles.<n>At its core are Oreo, a high- throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm.<n>HiPerRAG delivers robust performance on existing scientific question answering benchmarks and two new benchmarks introduced in this work.
arXiv Detail & Related papers (2025-05-07T22:50:23Z) - ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning [64.2106664137118]
ChemAgent is a novel framework designed to improve the performance of large language models (LLMs)<n>It is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries.<n>When presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory.
arXiv Detail & Related papers (2025-01-11T17:10:30Z) - ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain [0.8974531206817746]
This paper introduces a novel benchmark, the Chemical Text Embedding Benchmark (ChemTEB)<n>ChemTEB addresses the unique linguistic and semantic complexities of chemical literature and data.<n>We illuminate the strengths and weaknesses of current methodologies in processing and understanding chemical information.
arXiv Detail & Related papers (2024-11-30T16:45:31Z) - Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study [46.55831783809377]
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs)<n>We develop PruningRAG, a plug-and-play RAG framework that uses multi-granularity pruning strategies to more effectively incorporate relevant context and mitigate the negative impact of misleading information.
arXiv Detail & Related papers (2024-09-03T03:31:37Z) - RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules.
RAGChecker has significantly better correlations with human judgments than other evaluation metrics.
The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z) - FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research [70.6584488911715]
retrieval-augmented generation (RAG) has attracted considerable research attention.<n>Existing RAG toolkits are often heavy and inflexibly, failing to meet the customization needs of researchers.<n>Our toolkit has implemented 16 advanced RAG methods and gathered and organized 38 benchmark datasets.
arXiv Detail & Related papers (2024-05-22T12:12:40Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.