ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization
- URL: http://arxiv.org/abs/2502.06891v2
- Date: Fri, 11 Apr 2025 07:15:26 GMT
- Title: ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization
- Authors: Xuefeng Liu, Songhao Jiang, Ian Foster, Jinbo Xu, Rick Stevens,
- Abstract summary: ScaffoldGPT is a Generative Pretrained Transformer (GPT) designed for drug optimization based on molecular scaffolds.<n>Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization.<n>We demonstrate via a comprehensive evaluation on COVID and cancer benchmarks that ScaffoldGPT outperforms the competing baselines in drug optimization benchmarks.
- Score: 3.240904428766923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Generative Pretrained Transformer (GPT) designed for drug optimization based on molecular scaffolds. Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization. (2) A uniquely designed two-phase incremental training approach for pre-training the drug optimization GPT on molecule scaffold with enhanced performance. (3) A token-level decoding optimization strategy, TOP-N, that enabling controlled, reward-guided generation using pretrained/finetuned GPT. We demonstrate via a comprehensive evaluation on COVID and cancer benchmarks that ScaffoldGPT outperforms the competing baselines in drug optimization benchmarks, while excelling in preserving original functional scaffold and enhancing desired properties.
Related papers
- Protein Inverse Folding From Structure Feedback [78.27854221882572]
We introduce a novel approach to fine-tune an inverse folding model using feedback from a protein folding model.<n>Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning leads to a significant improvement in average TM-Score.
arXiv Detail & Related papers (2025-06-03T16:02:12Z) - ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy Prediction [0.9064217048217067]
We propose ADGSyn, an innovative method for predicting drug synergy.<n> Evaluated on the O'Neil dataset containing 13,243 drug--cell line combinations, ADGSyn demonstrates superior performance over eight baseline methods.
arXiv Detail & Related papers (2025-05-25T13:40:13Z) - Regulatory DNA sequence Design with Reinforcement Learning [56.20290878358356]
We propose a generative approach that leverages reinforcement learning to fine-tune a pre-trained autoregressive model.
We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types.
arXiv Detail & Related papers (2025-03-11T02:33:33Z) - ControllableGPT: A Ground-Up Designed Controllable GPT for Molecule Optimization [6.900025190052277]
We introduce ControllableGPT, a controllable training framework for large language models.
It is inspired by the biological processes of growth and evolution, which involve the expansion, shrinking, and mutation of sequences.
It enables the precise management of specific locations and ranges within a sequence, while maintaining the integrity of any specified positions or subsequences.
arXiv Detail & Related papers (2025-02-15T01:49:35Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update [16.743187639189976]
Structure-based molecule optimization (SBMO) aims to optimize molecules with both continuous coordinates and discrete types against protein targets.
MolJO is the first gradient-based SBMO framework that facilitates joint guidance signals across different modalities.
MolJO achieves state-of-the-art performance on CrossDocked 2020 benchmark.
arXiv Detail & Related papers (2024-11-20T12:48:29Z) - Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences.
Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function.
We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z) - Decomposed Direct Preference Optimization for Structure-Based Drug Design [47.561983733291804]
We propose DecompDPO, a structure-based optimization method to align diffusion models with pharmaceutical needs.
DecompDPO can be effectively used for two main purposes: fine-tuning pretrained diffusion models for molecule generation across various protein families, and molecular optimization given a specific protein subpocket after generation.
arXiv Detail & Related papers (2024-07-19T02:12:25Z) - LightCPPgen: An Explainable Machine Learning Pipeline for Rational Design of Cell Penetrating Peptides [0.32985979395737786]
We introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms.
Our strategy, named Light CPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA)
The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide.
arXiv Detail & Related papers (2024-05-31T10:57:25Z) - Deep Lead Optimization: Leveraging Generative AI for Structural Modification [12.167178956742113]
This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD.
We introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization.
arXiv Detail & Related papers (2024-04-30T03:17:42Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - Reinforced Genetic Algorithm for Structure-based Drug Design [38.134929249388406]
Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules that bind to a disease-related protein (targets)
We propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior.
arXiv Detail & Related papers (2022-11-28T22:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.