Interactive Molecular Discovery with Natural Language
- URL: http://arxiv.org/abs/2306.11976v1
- Date: Wed, 21 Jun 2023 02:05:48 GMT
- Title: Interactive Molecular Discovery with Natural Language
- Authors: Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang,
Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, Zhiyuan Liu
- Abstract summary: We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
- Score: 69.89287960545903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language is expected to be a key medium for various human-machine
interactions in the era of large language models. When it comes to the
biochemistry field, a series of tasks around molecules (e.g., property
prediction, molecule mining, etc.) are of great significance while having a
high technical threshold. Bridging the molecule expressions in natural language
and chemical language can not only hugely improve the interpretability and
reduce the operation difficulty of these tasks, but also fuse the chemical
knowledge scattered in complementary materials for a deeper comprehension of
molecules. Based on these benefits, we propose the conversational molecular
design, a novel task adopting natural language for describing and editing
target molecules. To better accomplish this task, we design ChatMol, a
knowledgeable and versatile generative pre-trained model, enhanced by injecting
experimental property information, molecular spatial knowledge, and the
associations between natural and chemical languages into it. Several typical
solutions including large language models (e.g., ChatGPT) are evaluated,
proving the challenge of conversational molecular design and the effectiveness
of our knowledge enhancement method. Case observations and analysis are
conducted to provide directions for further exploration of natural-language
interaction in molecular discovery.
Related papers
- MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model [19.458584012046646]
We propose a novel physicochemical knowledge-guided molecular meta language framework MolMetaLM.
We design a molecule-specialized meta language paradigm, formatted as multiple S,P,O> knowledge triples sharing the same S (i.e., molecule)
By introducing different molecular knowledge and noises, the meta language paradigm generates tens of thousands of pretraining tasks.
arXiv Detail & Related papers (2024-11-23T09:27:38Z) - InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions [32.38318676313486]
InstructBioMol is designed to bridge natural language and biomolecules.
It can integrate multimodal biomolecules as input, and enable researchers to articulate design goals in natural language.
It can generate drug molecules with a 10% improvement in binding affinity and design enzymes that achieve an ESP Score of 70.4.
arXiv Detail & Related papers (2024-10-10T13:45:56Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Language models in molecular discovery [2.874893537471256]
" scientific language models" operate on small molecules, proteins or polymers.
In chemistry, language models contribute to accelerating the molecule discovery cycle.
We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling.
arXiv Detail & Related papers (2023-09-28T08:19:54Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Exploring Chemical Space using Natural Language Processing Methodologies
for Drug Discovery [0.5389800405902634]
Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge.
This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.
arXiv Detail & Related papers (2020-02-10T21:02:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.