Language models in molecular discovery
- URL: http://arxiv.org/abs/2309.16235v1
- Date: Thu, 28 Sep 2023 08:19:54 GMT
- Title: Language models in molecular discovery
- Authors: Nikita Janakarajan, Tim Erdmann, Sarath Swaminathan, Teodoro Laino,
Jannis Born
- Abstract summary: " scientific language models" operate on small molecules, proteins or polymers.
In chemistry, language models contribute to accelerating the molecule discovery cycle.
We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling.
- Score: 2.874893537471256
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The success of language models, especially transformer-based architectures,
has trickled into other domains giving rise to "scientific language models"
that operate on small molecules, proteins or polymers. In chemistry, language
models contribute to accelerating the molecule discovery cycle as evidenced by
promising recent findings in early-stage drug discovery. Here, we review the
role of language models in molecular discovery, underlining their strength in
de novo drug design, property prediction and reaction chemistry. We highlight
valuable open-source software assets thus lowering the entry barrier to the
field of scientific language modeling. Last, we sketch a vision for future
molecular design that combines a chatbot interface with access to computational
chemistry tools. Our contribution serves as a valuable resource for
researchers, chemists, and AI enthusiasts interested in understanding how
language models can and will be used to accelerate chemical discovery.
Related papers
- FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Transformers and Large Language Models for Chemistry and Drug Discovery [0.4769602527256662]
Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture.
Transformers tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration.
A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry.
arXiv Detail & Related papers (2023-10-09T18:40:04Z) - GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Keeping it Simple: Language Models can learn Complex Molecular
Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules.
The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.