DrugAssist: A Large Language Model for Molecule Optimization
- URL: http://arxiv.org/abs/2401.10334v1
- Date: Thu, 28 Dec 2023 10:46:56 GMT
- Title: DrugAssist: A Large Language Model for Molecule Optimization
- Authors: Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue
Wang, Wei Liu, Xiangxiang Zeng
- Abstract summary: DrugAssist is an interactive molecule optimization model that performs optimization through human-machine dialogue.
DrugAssist has achieved leading results in both single and multiple property optimization.
We publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks.
- Score: 29.95488215594247
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, the impressive performance of large language models (LLMs) on a
wide range of tasks has attracted an increasing number of attempts to apply
LLMs in drug discovery. However, molecule optimization, a critical task in the
drug discovery pipeline, is currently an area that has seen little involvement
from LLMs. Most of existing approaches focus solely on capturing the underlying
patterns in chemical structures provided by the data, without taking advantage
of expert feedback. These non-interactive approaches overlook the fact that the
drug discovery process is actually one that requires the integration of expert
experience and iterative refinement. To address this gap, we propose
DrugAssist, an interactive molecule optimization model which performs
optimization through human-machine dialogue by leveraging LLM's strong
interactivity and generalizability. DrugAssist has achieved leading results in
both single and multiple property optimization, simultaneously showcasing
immense potential in transferability and iterative optimization. In addition,
we publicly release a large instruction-based dataset called
MolOpt-Instructions for fine-tuning language models on molecule optimization
tasks. We have made our code and data publicly available at
https://github.com/blazerye/DrugAssist, which we hope to pave the way for
future research in LLMs' application for drug discovery.
Related papers
- Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Many-Shot In-Context Learning for Molecular Inverse Design [56.65345962071059]
Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL)
We develop a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL.
As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.
arXiv Detail & Related papers (2024-07-26T21:10:50Z) - LICO: Large Language Models for In-Context Molecular Optimization [33.5918976228562]
We introduce LICO, a general-purpose model that extends arbitrary base LLMs for black-box optimization.
We train the model to perform in-context predictions on a diverse set of functions defined over the domain.
Once trained, LICO can generalize to unseen molecule properties simply via in-context prompting.
arXiv Detail & Related papers (2024-06-27T02:43:18Z) - Efficient Evolutionary Search Over Chemical Space with Large Language Models [31.31899988523534]
optimization objectives can be non-differentiable.
We introduce chemistry-aware Large Language Models (LLMs) into evolutionary algorithms.
Our algorithm improves both the quality of the final solution and convergence speed.
arXiv Detail & Related papers (2024-06-23T06:22:49Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - InstructMol: Multi-Modal Integration for Building a Versatile and
Reliable Molecular Assistant in Drug Discovery [19.870192393785043]
Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data.
Our novel contribution, InstructMol, effectively aligns molecular structures with natural language via an instruction-tuning approach.
InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks.
arXiv Detail & Related papers (2023-11-27T16:47:51Z) - Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers [70.18534453485849]
EvoPrompt is a framework for discrete prompt optimization.
It borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence.
It significantly outperforms human-engineered prompts and existing methods for automatic prompt generation.
arXiv Detail & Related papers (2023-09-15T16:50:09Z) - Improving Small Language Models on PubMedQA via Generative Data
Augmentation [4.96649519549027]
Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing.
Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data.
We introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation.
arXiv Detail & Related papers (2023-05-12T23:49:23Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.