Related papers: Are large language models superhuman chemists?

Are large language models superhuman chemists?

URL: http://arxiv.org/abs/2404.01475v2
Date: Fri, 01 Nov 2024 07:05:33 GMT
Title: Are large language models superhuman chemists?
Authors: Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Martiño Ríos-García, Benedict Emoekabu, Aswanth Krishnan, Tanya Gupta, Mara Schilling-Wilhelmi, Macjonathan Okereke, Anagha Aneesh, Amir Mohammad Elahi, Mehrdad Asgari, Juliane Eberhardt, Hani M. Elbeheiry, María Victoria Gil, Maximilian Greiner, Caroline T. Holick, Christina Glaubitz, Tim Hoffmann, Abdelrahman Ibrahim, Lea C. Klepsch, Yannik Köster, Fabian Alexander Kreth, Jakob Meyer, Santiago Miret, Jan Matthias Peschel, Michael Ringleb, Nicole Roesner, Johanna Schreiber, Ulrich S. Schubert, Leanne M. Stafast, Dinga Wonanke, Michael Pieler, Philippe Schwaller, Kevin Maik Jablonka,
Abstract summary: Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained. Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs. We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists.
Score: 4.87961182129702
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists in our study on average. However, the models struggle with some basic tasks and provide overconfident predictions. These findings reveal LLMs' impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.

Related papers

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge [14.6026550444088]
This work focuses on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R.<n>We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry.<n> Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs.
arXiv Detail & Related papers (2025-07-29T16:40:49Z)
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning [64.2106664137118]
ChemAgent is a novel framework designed to improve the performance of large language models (LLMs) It is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries. When presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory.
arXiv Detail & Related papers (2025-01-11T17:10:30Z)
From Generalist to Specialist: A Survey of Large Language Models for Chemistry [14.317448405387195]
Large Language Models (LLMs) have significantly transformed our daily life and established a new paradigm in natural language processing (NLP) The predominant pretraining of LLMs on extensive web-based texts remains insufficient for advanced scientific discovery, particularly in chemistry. Although several studies have reviewed Pretrained Language Models (PLMs) in chemistry, there is a conspicuous absence of a systematic survey specifically focused on chemistry-oriented LLMs.
arXiv Detail & Related papers (2024-12-28T03:40:25Z)
ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models [62.37850540570268]
Existing benchmarks in this domain fail to adequately meet the specific requirements of chemical research professionals. ChemEval identifies 4 crucial progressive levels in chemistry, assessing 12 dimensions of LLMs across 42 distinct chemical tasks. Results show that while general LLMs excel in literature understanding and instruction following, they fall short in tasks demanding advanced chemical knowledge.
arXiv Detail & Related papers (2024-09-21T02:50:43Z)
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [50.15254966969718]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications. ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z)
ChemLLM: A Chemical Large Language Model [49.308528569982805]
Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry.
arXiv Detail & Related papers (2024-02-10T01:11:59Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
ChemDFM: A Large Language Foundation Model for Chemistry [27.864255196445324]
A more generic and efficient solution would be an AI model that could address many tasks and support free-form dialogue in the broad field of chemistry. We develop ChemDFM, a pioneering LLM for chemistry trained on 34B tokens from chemical literature and textbooks, and fine-tuned using 2.7M instructions. We have open-sourced the inference codes, evaluation datasets, and model weights of ChemDFM on Huggingface.
arXiv Detail & Related papers (2024-01-26T12:45:55Z)
Structured Chemistry Reasoning with Large Language Models [70.13959639460015]
Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in chemistry. We introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability. Tests across four chemistry areas -- quantum chemistry, mechanics, physical chemistry, and kinetics -- StructChem substantially enhances GPT-4's performance, with up to 30% peak improvement.
arXiv Detail & Related papers (2023-11-16T08:20:36Z)
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks. In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z)
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks [41.9830989458936]
Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged. We aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain.
arXiv Detail & Related papers (2023-05-27T14:17:33Z)
ChemCrow: Augmenting large-language models with chemistry tools [0.9195187117013247]
Large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore.
arXiv Detail & Related papers (2023-04-11T17:41:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.