EPO: Diverse and Realistic Protein Ensemble Generation via Energy Preference Optimization
- URL: http://arxiv.org/abs/2511.10165v1
- Date: Fri, 14 Nov 2025 01:36:31 GMT
- Title: EPO: Diverse and Realistic Protein Ensemble Generation via Energy Preference Optimization
- Authors: Yuancheng Sun, Yuxuan Ren, Zhaoming Chen, Xu Han, Kang Liu, Qiwei Ye,
- Abstract summary: This paper presents Energy Preference Optimization (EPO), an online refinement that turns a pretrained protein ensemble generator into an energy-aware sampler.<n>On Tetrapeptides, ATLAS, and Fast-Folding benchmarks, EPO successfully generates diverse and physically realistic ensembles.
- Score: 14.859985641146672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate exploration of protein conformational ensembles is essential for uncovering function but remains hard because molecular-dynamics (MD) simulations suffer from high computational costs and energy-barrier trapping. This paper presents Energy Preference Optimization (EPO), an online refinement algorithm that turns a pretrained protein ensemble generator into an energy-aware sampler without extra MD trajectories. Specifically, EPO leverages stochastic differential equation sampling to explore the conformational landscape and incorporates a novel energy-ranking mechanism based on list-wise preference optimization. Crucially, EPO introduces a practical upper bound to efficiently approximate the intractable probability of long sampling trajectories in continuous-time generative models, making it easily adaptable to existing pretrained generators. On Tetrapeptides, ATLAS, and Fast-Folding benchmarks, EPO successfully generates diverse and physically realistic ensembles, establishing a new state-of-the-art in nine evaluation metrics. These results demonstrate that energy-only preference signals can efficiently steer generative models toward thermodynamically consistent conformational ensembles, providing an alternative to long MD simulations and widening the applicability of learned potentials in structural biology and drug discovery.
Related papers
- Energy-Guided Flow Matching Enables Few-Step Conformer Generation and Ground-State Identification [45.52894539097255]
We present EnFlow, a unified framework that couples flow matching with an explicitly learned energy model.<n>By incorporating energy-gradient guidance during sampling, our method steers trajectories toward lower-energy regions.<n>The learned energy function further enables efficient energy-based ranking of generated ensembles for accurate ground-state identification.
arXiv Detail & Related papers (2025-12-27T14:00:22Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [66.31417479052774]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>DMPO can consistently outperform or match existing techniques across different base models and test sets.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design [58.8094854658848]
We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design.<n>We propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions.<n>Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods.
arXiv Detail & Related papers (2025-07-01T05:55:28Z) - EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization [8.642286608437344]
This work aims to overcome the limitation by developing a model that directly generates low-energy, stable protein sequences.<n>We propose EnerBridge-DPO, a novel inverse folding framework focused on generating low-energy, high-stability protein sequences.<n>Our evaluations demonstrate that EnerBridge-DPO can design protein complex sequences with lower energy while maintaining sequence recovery rates comparable to state-of-the-art models.
arXiv Detail & Related papers (2025-06-11T08:12:26Z) - Aligning Protein Conformation Ensemble Generation with Physical Feedback [29.730515284798397]
Energy-based Alignment (EBA) is a method that aligns generative models with feedback from physical models.<n>EBA achieves state-of-the-art performance in generating high-quality protein ensembles.
arXiv Detail & Related papers (2025-05-30T04:33:39Z) - From expNN to sinNN: automatic generation of sum-of-products models for potential energy surfaces in internal coordinates using neural networks and sparse grid sampling [0.0]
This work aims to evaluate the practicality of a single-layer artificial neural network with sinusoidal activation functions for representing potential energy surfaces in sum-of-products form.<n>The fitting approach, named sinNN, is applied to modeling the PES of HONO, covering both the trans and cis isomers.<n>The sinNN PES model was able to reproduce available experimental fundamental vibrational transition energies with a root mean square error of about 17 cm-1.
arXiv Detail & Related papers (2025-04-30T07:31:32Z) - LightCPPgen: An Explainable Machine Learning Pipeline for Rational Design of Cell Penetrating Peptides [0.32985979395737786]
We introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms.
Our strategy, named Light CPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA)
The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide.
arXiv Detail & Related papers (2024-05-31T10:57:25Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.