Efficient Evolutionary Search Over Chemical Space with Large Language Models
- URL: http://arxiv.org/abs/2406.16976v2
- Date: Tue, 2 Jul 2024 16:12:38 GMT
- Title: Efficient Evolutionary Search Over Chemical Space with Large Language Models
- Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang,
- Abstract summary: optimization objectives can be non-differentiable.
We introduce chemistry-aware Large Language Models (LLMs) into evolutionary algorithms.
Our algorithm improves both the quality of the final solution and convergence speed.
- Score: 31.31899988523534
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO
Related papers
- Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - Many-Shot In-Context Learning for Molecular Inverse Design [56.65345962071059]
Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL)
We develop a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL.
As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.
arXiv Detail & Related papers (2024-07-26T21:10:50Z) - Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization [15.476478159958416]
We employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization problems.
Our aim is to speed up the convergence of the evolutionary population.
arXiv Detail & Related papers (2024-05-09T13:44:04Z) - DrugAssist: A Large Language Model for Molecule Optimization [29.95488215594247]
DrugAssist is an interactive molecule optimization model that performs optimization through human-machine dialogue.
DrugAssist has achieved leading results in both single and multiple property optimization.
We publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks.
arXiv Detail & Related papers (2023-12-28T10:46:56Z) - Large Language Models as Evolutionary Optimizers [37.92671242584431]
We present the first study on large language models (LLMs) as evolutionarys.
The main advantage is that it requires minimal domain knowledge and human efforts, as well as no additional training of the model.
We also study the effectiveness of the self-adaptation mechanism in evolutionary search.
arXiv Detail & Related papers (2023-10-29T15:44:52Z) - Molecule optimization via multi-objective evolutionary in implicit
chemical space [8.72872397589296]
MOMO is a multi-objective molecule optimization framework to address the challenge by combining learning of chemical knowledge with multi-objective evolutionary search.
We demonstrate the high performance of MOMO on four multi-objective property and similarity optimization tasks, and illustrate the search capability of MOMO through case studies.
arXiv Detail & Related papers (2022-12-17T09:09:23Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Computer-Aided Multi-Objective Optimization in Small Molecule Discovery [3.032184156362992]
We describe pool-based and de novo generative approaches to multi-objective molecular discovery.
We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization.
We discuss some remaining challenges and opportunities in the field.
arXiv Detail & Related papers (2022-10-13T17:33:07Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.