Transformers and Large Language Models for Chemistry and Drug Discovery
- URL: http://arxiv.org/abs/2310.06083v1
- Date: Mon, 9 Oct 2023 18:40:04 GMT
- Title: Transformers and Large Language Models for Chemistry and Drug Discovery
- Authors: Andres M Bran, Philippe Schwaller
- Abstract summary: Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture.
Transformers tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration.
A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry.
- Score: 0.4769602527256662
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Language modeling has seen impressive progress over the last years, mainly
prompted by the invention of the Transformer architecture, sparking a
revolution in many fields of machine learning, with breakthroughs in chemistry
and biology. In this chapter, we explore how analogies between chemical and
natural language have inspired the use of Transformers to tackle important
bottlenecks in the drug discovery process, such as retrosynthetic planning and
chemical space exploration. The revolution started with models able to perform
particular tasks with a single type of data, like linearised molecular graphs,
which then evolved to include other types of data, like spectra from analytical
instruments, synthesis actions, and human language. A new trend leverages
recent developments in large language models, giving rise to a wave of models
capable of solving generic tasks in chemistry, all facilitated by the
flexibility of natural language. As we continue to explore and harness these
capabilities, we can look forward to a future where machine learning plays an
even more integral role in accelerating scientific discovery.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule [16.641797535842752]
In this paper, we present the first systematic survey on multimodal frameworks for molecules research.
We begin with the development of molecular deep learning and point out the necessity to involve textual modality.
Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery.
arXiv Detail & Related papers (2024-03-07T03:03:13Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Large Language Models for Scientific Synthesis, Inference and
Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation.
We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature.
This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z) - Language models in molecular discovery [2.874893537471256]
" scientific language models" operate on small molecules, proteins or polymers.
In chemistry, language models contribute to accelerating the molecule discovery cycle.
We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling.
arXiv Detail & Related papers (2023-09-28T08:19:54Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - A Computational Inflection for Scientific Discovery [48.176406062568674]
We stand at the foot of a significant inflection in the trajectory of scientific discovery.
As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge.
Computer science is poised to ignite a revolution in the scientific process itself.
arXiv Detail & Related papers (2022-05-04T11:36:54Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - ChemoVerse: Manifold traversal of latent spaces for novel molecule
discovery [0.7742297876120561]
It is essential to identify molecular structures with the desired chemical properties.
Recent advances in generative models using neural networks and machine learning are being widely used to design virtual libraries of drug-like compounds.
arXiv Detail & Related papers (2020-09-29T12:11:40Z) - Generative chemistry: drug discovery with deep learning generative
models [0.0]
This paper reviews the latest advances in generative chemistry which relies on generative modeling to expedite the drug discovery process.
The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compound generation are focused.
arXiv Detail & Related papers (2020-08-20T14:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.