Language models in molecular discovery
        - URL: http://arxiv.org/abs/2309.16235v1
- Date: Thu, 28 Sep 2023 08:19:54 GMT
- Title: Language models in molecular discovery
- Authors: Nikita Janakarajan, Tim Erdmann, Sarath Swaminathan, Teodoro Laino,
  Jannis Born
- Abstract summary: " scientific language models" operate on small molecules, proteins or polymers.
In chemistry, language models contribute to accelerating the molecule discovery cycle.
We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling.
- Score: 2.874893537471256
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract:   The success of language models, especially transformer-based architectures,
has trickled into other domains giving rise to "scientific language models"
that operate on small molecules, proteins or polymers. In chemistry, language
models contribute to accelerating the molecule discovery cycle as evidenced by
promising recent findings in early-stage drug discovery. Here, we review the
role of language models in molecular discovery, underlining their strength in
de novo drug design, property prediction and reaction chemistry. We highlight
valuable open-source software assets thus lowering the entry barrier to the
field of scientific language modeling. Last, we sketch a vision for future
molecular design that combines a chatbot interface with access to computational
chemistry tools. Our contribution serves as a valuable resource for
researchers, chemists, and AI enthusiasts interested in understanding how
language models can and will be used to accelerate chemical discovery.
 
      
        Related papers
        - mCLM: A Function-Infused and Synthesis-Friendly Modular Chemical   Language Model [65.69164455183956]
 We propose mCLM, a modular Chemical-Language Model tokenizing molecules into building blocks and learning a bilingual language model of both natural language descriptions of functions and molecule building blocks.<n>In experiments on 430 FDA-approved drugs, we find mCLM capable of significantly improving 5 out of 6 chemical functions critical to determining drug potentials.
 arXiv  Detail & Related papers  (2025-05-18T22:52:39Z)
- Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular   Language Model [55.87790704067848]
 Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules.
We introduce a module that integrates complementary information from different molecular encoders.
Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
 arXiv  Detail & Related papers  (2025-02-19T05:49:10Z)
- FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
 We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
 arXiv  Detail & Related papers  (2024-10-02T23:04:58Z)
- MolTRES: Improving Chemical Language Representation Learning for   Molecular Property Prediction [14.353313239109337]
 MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
 arXiv  Detail & Related papers  (2024-07-09T01:14:28Z)
- Leveraging Biomolecule and Natural Language through Multi-Modal
  Learning: A Survey [75.47055414002571]
 The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
 arXiv  Detail & Related papers  (2024-03-03T14:59:47Z)
- Transformers and Large Language Models for Chemistry and Drug Discovery [0.4769602527256662]
 Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture.
Transformers tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration.
A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry.
 arXiv  Detail & Related papers  (2023-10-09T18:40:04Z)
- GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
  Graph, Image, and Text [25.979382232281786]
 We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
 arXiv  Detail & Related papers  (2023-08-14T03:12:29Z)
- Interactive Molecular Discovery with Natural Language [69.89287960545903]
 We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
 arXiv  Detail & Related papers  (2023-06-21T02:05:48Z)
- Empowering Molecule Discovery for Molecule-Caption Translation with   Large Language Models: A ChatGPT Perspective [53.300288393173204]
 Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
 arXiv  Detail & Related papers  (2023-06-11T08:16:25Z)
- A Molecular Multimodal Foundation Model Associating Molecule Graphs with
  Natural Language [63.60376252491507]
 We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
 arXiv  Detail & Related papers  (2022-09-12T00:56:57Z)
- Keeping it Simple: Language Models can learn Complex Molecular
  Distributions [0.0]
 We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules.
The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
 arXiv  Detail & Related papers  (2021-12-06T13:40:58Z)
- Advanced Graph and Sequence Neural Networks for Molecular Property
  Prediction and Drug Discovery [53.00288162642151]
 We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
 arXiv  Detail & Related papers  (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.