Can Large Language Models Empower Molecular Property Prediction?
- URL: http://arxiv.org/abs/2307.07443v1
- Date: Fri, 14 Jul 2023 16:06:42 GMT
- Title: Can Large Language Models Empower Molecular Property Prediction?
- Authors: Chen Qian, Huayi Tang, Zhirui Yang, Hong Liang, Yong Liu
- Abstract summary: Molecular property prediction has gained significant attention due to its transformative potential in scientific disciplines.
Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP.
In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules.
- Score: 16.5246941211725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular property prediction has gained significant attention due to its
transformative potential in multiple scientific disciplines. Conventionally, a
molecule graph can be represented either as a graph-structured data or a SMILES
text. Recently, the rapid development of Large Language Models (LLMs) has
revolutionized the field of NLP. Although it is natural to utilize LLMs to
assist in understanding molecules represented by SMILES, the exploration of how
LLMs will impact molecular property prediction is still in its early stage. In
this work, we advance towards this objective through two perspectives:
zero/few-shot molecular classification, and using the new explanations
generated by LLMs as representations of molecules. To be specific, we first
prompt LLMs to do in-context molecular classification and evaluate their
performance. After that, we employ LLMs to generate semantically enriched
explanations for the original SMILES and then leverage that to fine-tune a
small-scale LM model for multiple downstream tasks. The experimental results
highlight the superiority of text explanations as molecular representations
across multiple benchmark datasets, and confirm the immense potential of LLMs
in molecular property prediction tasks. Codes are available at
\url{https://github.com/ChnQ/LLM4Mol}.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction [44.27112553103388]
We present Molecule Caption Arena: the first comprehensive benchmark of large language models (LLMs)augmented molecular property prediction.
We evaluate over twenty LLMs, including both general-purpose and domain-specific molecule captioners, across diverse prediction tasks.
Our findings confirm the ability of LLM-extracted knowledge to enhance state-of-the-art molecular representations.
arXiv Detail & Related papers (2024-11-01T17:03:16Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension [34.586861881519134]
Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields.
This study seeks to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, namely MolX.
In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations.
arXiv Detail & Related papers (2024-06-10T20:25:18Z) - Benchmarking Large Language Models for Molecule Prediction Tasks [7.067145619709089]
Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks.
This paper explores a fundamental question: Can LLMs effectively handle molecule prediction tasks?
We identify several classification and regression prediction tasks across six standard molecule datasets.
We compare their performance with existing Machine Learning (ML) models, which include text-based models and those specifically designed for analysing the geometric structure of molecules.
arXiv Detail & Related papers (2024-03-08T05:59:56Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Can Large Language Models Understand Molecules? [0.0699049312989311]
We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks.
We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks.
arXiv Detail & Related papers (2024-01-05T18:31:34Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.