Benchmarking Agentic Systems in Automated Scientific Information Extraction with ChemX
- URL: http://arxiv.org/abs/2510.00795v1
- Date: Wed, 01 Oct 2025 11:50:11 GMT
- Title: Benchmarking Agentic Systems in Automated Scientific Information Extraction with ChemX
- Authors: Anastasia Vepreva, Julia Razlivina, Maria Eremeeva, Nina Gubina, Anastasia Orlova, Aleksei Dmitrenko, Ksenya Kapranova, Susan Jyakhwo, Nikita Vasilev, Arsen Sarkisyan, Ivan Yu. Chernyshov, Vladimir Vinogradov, Andrei Dmitrenko,
- Abstract summary: ChemX is a collection of 10 manually curated and domain-expert-validated datasets focusing on nanomaterials and small molecules.<n>These datasets are designed to rigorously evaluate and enhance automated extraction methodologies in chemistry.<n>We conduct an extensive benchmarking study comparing existing state-of-the-art agentic systems such as ChatGPT Agent and chemical-specific data extraction agents.
- Score: 2.1779971965303617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of agent-based systems represents a significant advancement in artificial intelligence, with growing applications in automated data extraction. However, chemical information extraction remains a formidable challenge due to the inherent heterogeneity of chemical data. Current agent-based approaches, both general-purpose and domain-specific, exhibit limited performance in this domain. To address this gap, we present ChemX, a comprehensive collection of 10 manually curated and domain-expert-validated datasets focusing on nanomaterials and small molecules. These datasets are designed to rigorously evaluate and enhance automated extraction methodologies in chemistry. To demonstrate their utility, we conduct an extensive benchmarking study comparing existing state-of-the-art agentic systems such as ChatGPT Agent and chemical-specific data extraction agents. Additionally, we introduce our own single-agent approach that enables precise control over document preprocessing prior to extraction. We further evaluate the performance of modern baselines, such as GPT-5 and GPT-5 Thinking, to compare their capabilities with agentic approaches. Our empirical findings reveal persistent challenges in chemical information extraction, particularly in processing domain-specific terminology, complex tabular and schematic representations, and context-dependent ambiguities. The ChemX benchmark serves as a critical resource for advancing automated information extraction in chemistry, challenging the generalization capabilities of existing methods, and providing valuable insights into effective evaluation strategies.
Related papers
- AgentCAT: An LLM Agent for Extracting and Analyzing Catalytic Reaction Data from Chemical Engineering Literature [55.66036140125613]
This paper presents a large language model (LLM) agent named AgentCAT, which extracts and analyzes catalytic reaction data from chemical engineering papers.<n>AgentCAT serves as an alternative to overcome the long-standing data bottleneck in chemical engineering field.
arXiv Detail & Related papers (2026-02-10T04:30:11Z) - A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature [8.306442315850878]
We develop a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction.<n>Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature.
arXiv Detail & Related papers (2025-07-27T11:16:57Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain [0.8974531206817746]
This paper introduces a novel benchmark, the Chemical Text Embedding Benchmark (ChemTEB)<n>ChemTEB addresses the unique linguistic and semantic complexities of chemical literature and data.<n>We illuminate the strengths and weaknesses of current methodologies in processing and understanding chemical information.
arXiv Detail & Related papers (2024-11-30T16:45:31Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
We present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - Intelligent Chemical Purification Technique Based on Machine Learning [5.023197681500998]
We present an innovative of artificial intelligence with column chromatography, aiming to resolve inefficiencies and standardize data collection in chemical separation and purification domain.
By developing an automated platform for precise data acquisition and employing advanced machine learning algorithms, we constructed predictive models to forecast key separation parameters.
A novel metric, separation probability ($S_p$), quantifies the likelihood of effective compound separation, validated through experimental verification.
arXiv Detail & Related papers (2024-04-14T01:44:58Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.