Monte Carlo Tree Diffusion with Multiple Experts for Protein Design
- URL: http://arxiv.org/abs/2509.15796v1
- Date: Fri, 19 Sep 2025 09:24:42 GMT
- Title: Monte Carlo Tree Diffusion with Multiple Experts for Protein Design
- Authors: Xuefeng Liu, Mingxuan Cao, Songhao Jiang, Xiao Luo, Xiaotian Duan, Mengdi Wang, Tobin R. Sosnick, Jinbo Xu, Rick Stevens,
- Abstract summary: We propose MCTD-ME, which integrates masked diffusion models with tree search to enable multi-token planning and efficient exploration.<n>Unlike autoregressive planners, MCTD-ME uses biophysical-enhanced diffusion denoising as the rollout engine.<n>The framework is model-agnostic and applicable beyond inverse folding, including de novo protein engineering and multi-objective molecular generation.
- Score: 50.056670856059014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of protein design is to generate amino acid sequences that fold into functional structures with desired properties. Prior methods combining autoregressive language models with Monte Carlo Tree Search (MCTS) struggle with long-range dependencies and suffer from an impractically large search space. We propose MCTD-ME, Monte Carlo Tree Diffusion with Multiple Experts, which integrates masked diffusion models with tree search to enable multi-token planning and efficient exploration. Unlike autoregressive planners, MCTD-ME uses biophysical-fidelity-enhanced diffusion denoising as the rollout engine, jointly revising multiple positions and scaling to large sequence spaces. It further leverages experts of varying capacities to enrich exploration, guided by a pLDDT-based masking schedule that targets low-confidence regions while preserving reliable residues. We propose a novel multi-expert selection rule (PH-UCT-ME) extends predictive-entropy UCT to expert ensembles. On the inverse folding task (CAMEO and PDB benchmarks), MCTD-ME outperforms single-expert and unguided baselines in both sequence recovery (AAR) and structural similarity (scTM), with gains increasing for longer proteins and benefiting from multi-expert guidance. More generally, the framework is model-agnostic and applicable beyond inverse folding, including de novo protein engineering and multi-objective molecular generation.
Related papers
- MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts [0.8292000624465587]
Real-world time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes.<n>MoHETS integrates sparse Mixture-of-Heterogeneous-Experts layers.<n>We replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder.
arXiv Detail & Related papers (2026-01-29T15:35:26Z) - Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation [0.9332987715848714]
Large language model (LLM) agents operate in parallel, each assigned to a specific residue position.<n>This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences.<n>Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training.
arXiv Detail & Related papers (2025-11-27T10:42:52Z) - HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation [72.69742127579508]
Recent unified models integrate understanding experts (e.g., LLMs) with generative experts (e.g., diffusion models)<n>In this work, we propose HBridge, an asymmetric H-shaped architecture that enables heterogeneous experts to optimally leverage pretrained priors.<n> Extensive experiments across multiple benchmarks demonstrate the effectiveness and superior performance of HBridge.
arXiv Detail & Related papers (2025-11-25T17:23:38Z) - ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search [77.55575655986252]
ProtInvTree is a reward-guided tree-search framework for protein inverse folding.<n>It reformulates sequence generation as a deliberate, step-wise decision-making process.<n>It supports flexible test-time scaling by expanding the search depth and breadth without retraining.
arXiv Detail & Related papers (2025-06-01T09:34:20Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)<n>Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion [2.6668932659159905]
We present PepTune, a multi-objective discrete diffusion model for simultaneous generation and optimization of therapeutic peptide SMILES.<n>To guide the diffusion process, we introduce Monte Carlo Tree Guidance (MCTG), an inference-time multi-objective guidance algorithm.<n>Using PepTune, we generate diverse, chemically-modified peptides simultaneously optimized for multiple therapeutic properties.
arXiv Detail & Related papers (2024-12-23T18:38:49Z) - Multi-Head Mixture-of-Experts [100.60556163597946]
We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens.
MH-MoE is straightforward to implement and decouples from other SMoE optimization methods, making it easy to integrate with other SMoE models for enhanced performance.
arXiv Detail & Related papers (2024-04-23T13:47:09Z) - Diffusion on language model encodings for protein sequence generation [0.5088559194265662]
DiMA is a latent diffusion framework that operates on protein language model representations.<n>It consistently produces novel, high-quality and diverse protein sequences.<n>It supports conditional generation tasks including protein family-generation, motif scaffolding and infilling, and fold-specific sequence design.
arXiv Detail & Related papers (2024-03-06T14:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.