Exploring Chemical Space using Natural Language Processing Methodologies
for Drug Discovery
- URL: http://arxiv.org/abs/2002.06053v1
- Date: Mon, 10 Feb 2020 21:02:05 GMT
- Title: Exploring Chemical Space using Natural Language Processing Methodologies
for Drug Discovery
- Authors: Hakime \"Ozt\"urk, Arzucan \"Ozg\"ur, Philippe Schwaller, Teodoro
Laino, Elif Ozkirimli
- Abstract summary: Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge.
This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.
- Score: 0.5389800405902634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-based representations of chemicals and proteins can be thought of as
unstructured languages codified by humans to describe domain-specific
knowledge. Advances in natural language processing (NLP) methodologies in the
processing of spoken languages accelerated the application of NLP to elucidate
hidden knowledge in textual representations of these biochemical entities and
then use it to construct models to predict molecular properties or to design
novel molecules. This review outlines the impact made by these advances on drug
discovery and aims to further the dialogue between medicinal chemists and
computer scientists.
Related papers
- FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Natural Language Processing Methods for the Study of Protein-Ligand Interactions [8.165512093198934]
Recent advances in Natural Language Processing have ignited interest in developing effective methods for predicting protein-ligand interactions.
In this review, we explain where and how such approaches have been applied in the recent literature and discuss useful mechanisms such as short-term memory, transformers, and attention.
We conclude with a discussion of the current limitations of NLP methods for the study of PLIs as well as key challenges that need to be addressed in future work.
arXiv Detail & Related papers (2024-09-19T19:14:50Z) - MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Empirical Evidence for the Fragment level Understanding on Drug
Molecular Structure of LLMs [16.508471997999496]
We investigate whether and how language models understand the chemical spatial structure from 1D sequences.
The results indicate that language models can understand chemical structures from the perspective of molecular fragments.
arXiv Detail & Related papers (2024-01-15T12:53:58Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.