What can Large Language Models do in chemistry? A comprehensive
benchmark on eight tasks
- URL: http://arxiv.org/abs/2305.18365v3
- Date: Thu, 28 Dec 2023 04:29:36 GMT
- Title: What can Large Language Models do in chemistry? A comprehensive
benchmark on eight tasks
- Authors: Taicheng Guo, Kehan Guo, Bozhao Nan, Zhenwen Liang, Zhichun Guo,
Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang
- Abstract summary: Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged.
We aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain.
- Score: 41.9830989458936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) with strong abilities in natural language
processing tasks have emerged and have been applied in various kinds of areas
such as science, finance and software engineering. However, the capability of
LLMs to advance the field of chemistry remains unclear. In this paper, rather
than pursuing state-of-the-art performance, we aim to evaluate capabilities of
LLMs in a wide range of tasks across the chemistry domain. We identify three
key chemistry-related capabilities including understanding, reasoning and
explaining to explore in LLMs and establish a benchmark containing eight
chemistry tasks. Our analysis draws on widely recognized datasets facilitating
a broad exploration of the capacities of LLMs within the context of practical
chemistry. Five LLMs (GPT-4, GPT-3.5, Davinci-003, Llama and Galactica) are
evaluated for each chemistry task in zero-shot and few-shot in-context learning
settings with carefully selected demonstration examples and specially crafted
prompts. Our investigation found that GPT-4 outperformed other models and LLMs
exhibit different competitive levels in eight chemistry tasks. In addition to
the key findings from the comprehensive benchmark analysis, our work provides
insights into the limitation of current LLMs and the impact of in-context
learning settings on LLMs' performance across various chemistry tasks. The code
and datasets used in this study are available at
https://github.com/ChemFoundationModels/ChemLLMBench.
Related papers
- ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models [62.37850540570268]
Existing benchmarks in this domain fail to adequately meet the specific requirements of chemical research professionals.
ChemEval identifies 4 crucial progressive levels in chemistry, assessing 12 dimensions of LLMs across 42 distinct chemical tasks.
Results show that while general LLMs excel in literature understanding and instruction following, they fall short in tasks demanding advanced chemical knowledge.
arXiv Detail & Related papers (2024-09-21T02:50:43Z) - ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [50.15254966969718]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications.
ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information.
We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z) - Are large language models superhuman chemists? [4.87961182129702]
Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained.
Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs.
We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists.
arXiv Detail & Related papers (2024-04-01T20:56:25Z) - Benchmarking Large Language Models for Molecule Prediction Tasks [7.067145619709089]
Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks.
This paper explores a fundamental question: Can LLMs effectively handle molecule prediction tasks?
We identify several classification and regression prediction tasks across six standard molecule datasets.
We compare their performance with existing Machine Learning (ML) models, which include text-based models and those specifically designed for analysing the geometric structure of molecules.
arXiv Detail & Related papers (2024-03-08T05:59:56Z) - LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset [13.063678216852473]
We show that large language models (LLMs) can achieve very strong results on a comprehensive set of chemistry tasks.
We propose SMolInstruct, a large-scale, comprehensive, and high-quality dataset for instruction tuning.
Using SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks.
arXiv Detail & Related papers (2024-02-14T18:42:25Z) - ChemLLM: A Chemical Large Language Model [49.308528569982805]
Large language models (LLMs) have made impressive progress in chemistry applications.
However, the community lacks an LLM specifically designed for chemistry.
Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry.
arXiv Detail & Related papers (2024-02-10T01:11:59Z) - From Words to Molecules: A Survey of Large Language Models in Chemistry [8.129759559674968]
This paper explores the nuanced methodologies employed in integrating Large Language Models (LLMs) into the field of chemistry.
We categorize chemical LLMs into three distinct groups based on the domain and modality of their input data, and discuss approaches for integrating these inputs for LLMs.
Finally, we identify promising research directions, including further integration with chemical knowledge, advancements in continual learning, and improvements in model interpretability.
arXiv Detail & Related papers (2024-02-02T14:30:48Z) - Structured Chemistry Reasoning with Large Language Models [70.13959639460015]
Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in chemistry.
We introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability.
Tests across four chemistry areas -- quantum chemistry, mechanics, physical chemistry, and kinetics -- StructChem substantially enhances GPT-4's performance, with up to 30% peak improvement.
arXiv Detail & Related papers (2023-11-16T08:20:36Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.