Related papers: DrugAssist: A Large Language Model for Molecule Optimization

DrugAssist: A Large Language Model for Molecule Optimization

URL: http://arxiv.org/abs/2401.10334v1
Date: Thu, 28 Dec 2023 10:46:56 GMT
Title: DrugAssist: A Large Language Model for Molecule Optimization
Authors: Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue Wang, Wei Liu, Xiangxiang Zeng
Abstract summary: DrugAssist is an interactive molecule optimization model that performs optimization through human-machine dialogue. DrugAssist has achieved leading results in both single and multiple property optimization. We publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks.
Score: 29.95488215594247
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https://github.com/blazerye/DrugAssist, which we hope to pave the way for future research in LLMs' application for drug discovery.

Related papers

Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization [51.104444856052204]
We present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization. In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate.
arXiv Detail & Related papers (2025-03-05T13:47:55Z)
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z)
Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM) TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions. Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Many-Shot In-Context Learning for Molecular Inverse Design [56.65345962071059]
Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL) We develop a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL. As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.
arXiv Detail & Related papers (2024-07-26T21:10:50Z)
LICO: Large Language Models for In-Context Molecular Optimization [33.5918976228562]
We introduce LICO, a general-purpose model that extends arbitrary base LLMs for black-box optimization. We train the model to perform in-context predictions on a diverse set of functions defined over the domain. Once trained, LICO can generalize to unseen molecule properties simply via in-context prompting.
arXiv Detail & Related papers (2024-06-27T02:43:18Z)
Efficient Evolutionary Search Over Chemical Space with Large Language Models [31.31899988523534]
optimization objectives can be non-differentiable. We introduce chemistry-aware Large Language Models (LLMs) into evolutionary algorithms. Our algorithm improves both the quality of the final solution and convergence speed.
arXiv Detail & Related papers (2024-06-23T06:22:49Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery [19.870192393785043]
Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, effectively aligns molecular structures with natural language via an instruction-tuning approach. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks.
arXiv Detail & Related papers (2023-11-27T16:47:51Z)
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers [70.18534453485849]
EvoPrompt is a framework for discrete prompt optimization. It borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. It significantly outperforms human-engineered prompts and existing methods for automatic prompt generation.
arXiv Detail & Related papers (2023-09-15T16:50:09Z)
Improving Small Language Models on PubMedQA via Generative Data Augmentation [4.96649519549027]
Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data. We introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation.
arXiv Detail & Related papers (2023-05-12T23:49:23Z)
SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery. wet experiments remain the most reliable method, but they are time-consuming and resource-intensive. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.