Small Molecule Optimization with Large Language Models
- URL: http://arxiv.org/abs/2407.18897v1
- Date: Fri, 26 Jul 2024 17:51:33 GMT
- Title: Small Molecule Optimization with Large Language Models
- Authors: Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan,
- Abstract summary: We present two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens.
We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle.
- Score: 17.874902319523663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models have opened new possibilities for generative molecular drug design. We present Chemlactica and Chemma, two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. These models demonstrate strong performance in generating molecules with specified properties and predicting new molecular characteristics from limited samples. We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle. Our approach combines ideas from genetic algorithms, rejection sampling, and prompt optimization. It achieves state-of-the-art performance on multiple molecular optimization benchmarks, including an 8% improvement on Practical Molecular Optimization compared to previous methods. We publicly release the training corpus, the language models and the optimization algorithm.
Related papers
- Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously.
Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization.
We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z) - In-Context Learning for Few-Shot Molecular Property Prediction [56.67309268480843]
In this paper, we adapt the concepts underpinning in-context learning to develop a new algorithm for few-shot molecular property prediction.
Our approach learns to predict molecular properties from a context of (molecule, property measurement) pairs and rapidly adapts to new properties without fine-tuning.
arXiv Detail & Related papers (2023-10-13T05:12:48Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Molecule Optimization via Fragment-based Generative Models [21.888942129750124]
In drug discovery, molecule optimization is an important step in order to modify drug candidates into better ones in terms of desired drug properties.
We present an innovative in silico approach to computationally optimizing molecules and formulate the problem as to generate optimized molecular graphs.
Our generative models follow the key idea of fragment-based drug design, and optimize molecules by modifying their small fragments.
arXiv Detail & Related papers (2020-12-08T05:52:16Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Graph Polish: A Novel Graph Generation Paradigm for Molecular
Optimization [7.1696593196695035]
We present a novel molecular optimization paradigm, Graph Polish, which changes molecular optimization from the traditional "two-language translating" task into a "single-language" task.
We propose an effective and efficient learning framework T&S polish to capture the long-term dependencies in the optimization steps.
arXiv Detail & Related papers (2020-08-14T08:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.