ScaffoldGPT: A Scaffold-based Large Language Model for Drug Improvement
- URL: http://arxiv.org/abs/2502.06891v1
- Date: Sun, 09 Feb 2025 10:36:33 GMT
- Title: ScaffoldGPT: A Scaffold-based Large Language Model for Drug Improvement
- Authors: Xuefeng Liu, Songhao Jiang, Rick Stevens,
- Abstract summary: ScaffoldGPT is a novel Large Language Model (LLM) designed for drug optimization based on molecular scaffolds.<n>Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization.
- Score: 2.6198448284771443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Large Language Model (LLM) designed for drug optimization based on molecular scaffolds. Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization. (2) A uniquely designed two-phase incremental training approach for pre-training the drug optimization LLM-based generator on molecule scaffold with enhanced performance. (3) A token-level decoding optimization strategy, TOP-N, that enabling controlled, reward-guided generation using pretrained/finetuned LLMs. Finally, by conducting a comprehensive evaluation on COVID and cancer benchmarks, we demonstrate that SCAFFOLDGPT outperforms the competing baselines in drug optimization benchmarks, while excelling in preserving the original functional scaffold and enhancing desired properties.
Related papers
- Regulatory DNA sequence Design with Reinforcement Learning [56.20290878358356]
We propose a generative approach that leverages reinforcement learning to fine-tune a pre-trained autoregressive model.
We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types.
arXiv Detail & Related papers (2025-03-11T02:33:33Z) - ControllableGPT: A Ground-Up Designed Controllable GPT for Molecule Optimization [6.900025190052277]
We introduce ControllableGPT, a controllable training framework for large language models.
It is inspired by the biological processes of growth and evolution, which involve the expansion, shrinking, and mutation of sequences.
It enables the precise management of specific locations and ranges within a sequence, while maintaining the integrity of any specified positions or subsequences.
arXiv Detail & Related papers (2025-02-15T01:49:35Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update [16.743187639189976]
Structure-based molecule optimization (SBMO) aims to optimize molecules with both continuous coordinates and discrete types against protein targets.
MolJO is the first gradient-based SBMO framework that facilitates joint guidance signals across different modalities.
MolJO achieves state-of-the-art performance on CrossDocked 2020 benchmark.
arXiv Detail & Related papers (2024-11-20T12:48:29Z) - Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences.
Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function.
We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z) - Decomposed Direct Preference Optimization for Structure-Based Drug Design [47.561983733291804]
We propose DecompDPO, a structure-based optimization method to align diffusion models with pharmaceutical needs.
DecompDPO can be effectively used for two main purposes: fine-tuning pretrained diffusion models for molecule generation across various protein families, and molecular optimization given a specific protein subpocket after generation.
arXiv Detail & Related papers (2024-07-19T02:12:25Z) - LightCPPgen: An Explainable Machine Learning Pipeline for Rational Design of Cell Penetrating Peptides [0.32985979395737786]
We introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms.
Our strategy, named Light CPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA)
The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide.
arXiv Detail & Related papers (2024-05-31T10:57:25Z) - Deep Lead Optimization: Leveraging Generative AI for Structural Modification [12.167178956742113]
This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD.
We introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization.
arXiv Detail & Related papers (2024-04-30T03:17:42Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - Reinforced Genetic Algorithm for Structure-based Drug Design [38.134929249388406]
Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules that bind to a disease-related protein (targets)
We propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior.
arXiv Detail & Related papers (2022-11-28T22:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.