Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators
- URL: http://arxiv.org/abs/2602.11216v1
- Date: Wed, 11 Feb 2026 09:26:12 GMT
- Title: Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators
- Authors: Panagiotis Antoniadis, Beatrice Pavesi, Simon Olsson, Ole Winther,
- Abstract summary: We show that incorporating auxiliary sources of information can improve the data efficiency and generalization of implicit transfer operators for molecular dynamics.<n>Our approach, PLaTITO, achieves state-of-the-art performance on equilibrium sampling benchmarks for out-of-distribution protein systems.
- Score: 10.025462072265706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular dynamics (MD) is a central computational tool in physics, chemistry, and biology, enabling quantitative prediction of experimental observables as expectations over high-dimensional molecular distributions such as Boltzmann distributions and transition densities. However, conventional MD is fundamentally limited by the high computational cost required to generate independent samples. Generative molecular dynamics (GenMD) has recently emerged as an alternative, learning surrogates of molecular distributions either from data or through interaction with energy models. While these methods enable efficient sampling, their transferability across molecular systems is often limited. In this work, we show that incorporating auxiliary sources of information can improve the data efficiency and generalization of transferable implicit transfer operators (TITO) for molecular dynamics. We find that coarse-grained TITO models are substantially more data-efficient than Boltzmann Emulators, and that incorporating protein language model (pLM) embeddings further improves out-of-distribution generalization. Our approach, PLaTITO, achieves state-of-the-art performance on equilibrium sampling benchmarks for out-of-distribution protein systems, including fast-folding proteins. We further study the impact of additional conditioning signals -- such as structural embeddings, temperature, and large-language-model-derived embeddings -- on model performance.
Related papers
- Learning Inter-Atomic Potentials without Explicit Equivariance [24.438029202222555]
We introduce TransIP: Transformer-based Inter-Atomic Potentials, a novel training paradigm for interatomic potentials.<n>Our approach guides a generic non-equivariant Transformer-based model to learn SO(3)-equivariance by optimizing its representations in the embedding space.<n>Compared to a data augmentation baseline, TransIP achieves 40% to 60% improvement in performance across varying OMol25 dataset sizes.
arXiv Detail & Related papers (2025-09-25T22:15:10Z) - Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models [50.77646970127369]
We propose an energy-based diffusion model with a Fokker--Planck-derived regularization term to enforce consistency.<n>We demonstrate our approach by sampling and simulating multiple biomolecular systems, including fast-folding proteins.
arXiv Detail & Related papers (2025-06-20T16:38:29Z) - MDDM: A Molecular Dynamics Diffusion Model to Predict Particle Self-Assembly [0.7078672343881657]
The Molecular Dynamics Diffusion Model is capable of predicting a valid output for a given input pair potential function.<n>The model significantly outperforms the baseline point-cloud diffusion model for both unconditional and conditional generation tasks.
arXiv Detail & Related papers (2025-01-28T22:21:45Z) - MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations.<n>We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in the function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z) - Analysis of Atom-level pretraining with Quantum Mechanics (QM) data for Graph Neural Networks Molecular property models [0.0]
We show how atom-level pretraining with quantum mechanics (QM) data can mitigate violations of assumptions regarding the distributional similarity between training and test data.
This is the first time that hidden state molecular representations are analyzed to compare the effects of molecule-level and atom-level pretraining on QM data.
arXiv Detail & Related papers (2024-05-23T17:51:05Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Navigating protein landscapes with a machine-learned transferable
coarse-grained model [29.252004942896875]
coarse-grained (CG) model with similar prediction performance has been a long-standing challenge.
We develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences.
We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins.
arXiv Detail & Related papers (2023-10-27T17:10:23Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.