UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
- URL: http://arxiv.org/abs/2602.17709v1
- Date: Fri, 13 Feb 2026 04:38:28 GMT
- Title: UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
- Authors: Lin Huang, Arthur Jiang, XiaoLi Liu, Zion Wang, Jason Zhao, Chu Wang, HaoCheng Lu, ChengXiang Huang, JiaJun Cheng, YiYue Du, Jia Zhang,
- Abstract summary: We present UBio-MolFM, a universal foundation model framework designed to bridge the gap between quantum-mechanical (QM) accuracy and biological scale.<n>UBio-MolFM achieves ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to 1,500 atoms) and realistic observables.
- Score: 12.633470669776317
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal foundation model framework specifically engineered to bridge this gap. UBio-MolFM introduces three synergistic innovations: (1) UBio-Mol26, a large bio-specific dataset constructed via a multi-fidelity ``Two-Pronged Strategy'' that combines systematic bottom-up enumeration with top-down sampling of native protein environments (up to 1,200 atoms); (2) E2Former-V2, a linear-scaling equivariant transformer that integrates Equivariant Axis-Aligned Sparsification (EAAS) and Long-Short Range (LSR) modeling to capture non-local physics with up to ~4x higher inference throughput in our large-system benchmarks; and (3) a Three-Stage Curriculum Learning protocol that transitions from energy initialization to energy-force consistency, with force-focused supervision to mitigate energy offsets. Rigorous benchmarking across microscopic forces and macroscopic observables -- including liquid water structure, ionic solvation, and peptide folding -- demonstrates that UBio-MolFM achieves ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to ~1,500 atoms) and realistic MD observables. By reconciling scalability with quantum precision, UBio-MolFM provides a robust, ready-to-use tool for the next generation of computational biology.
Related papers
- Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials [51.342983349686556]
General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities.<n>We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials.
arXiv Detail & Related papers (2026-02-24T20:52:39Z) - Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z) - HemePLM-Diffuse: A Scalable Generative Framework for Protein-Ligand Dynamics in Large Biomolecular System [0.0]
We introduce HemeM-Diffuse, an innovative generative transformer model that is designed for accurate simulation of protein-ligand trajectories.<n>We show its capabilities using the 3CQV HEME system, showing enhanced accuracy and scalability compared to leading models such as TorchMD-Net, MDGEN, and Uni-Mol.
arXiv Detail & Related papers (2025-08-07T17:29:52Z) - A Scalable and Quantum-Accurate Foundation Model for Biomolecular Force Field via Linearly Tensorized Quadrangle Attention [6.749581549330875]
We present LiTEN, a novel AI-based force field framework for atomistic biomolecular simulations.<n>Building on LiTEN, LiTEN-FF is a robust AIFF foundation model, pre-trained on the nablaDFT dataset for broad chemical generalization.<n>LiTEN achieves state-of-the-art (SOTA) performance across most evaluation subsets of rMD17, MD22, and Chignolin, outperforming leading models such as MACE, NequIP, and EquiFormer.
arXiv Detail & Related papers (2025-07-01T15:52:39Z) - Bio2Token: All-atom tokenization of any biomolecular structure with Mamba [3.039173168183899]
We develop quantized auto-encoders that learn atom-level tokenizations of complete proteins, RNA and small molecule structures.<n>We demonstrate that a simple Mamba state space model architecture is efficient compared to an SE(3)-invariant IPA architecture.<n>The learned structure tokens of bio2token may serve as the input for all-atom generative models in the future.
arXiv Detail & Related papers (2024-10-24T19:23:09Z) - CryoFM: A Flow-based Foundation Model for Cryo-EM Densities [50.291974465864364]
We present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps.<n>Built on flow matching, CryoFM is trained to accurately capture the prior distribution of biomolecular density maps.
arXiv Detail & Related papers (2024-10-11T08:53:58Z) - Machine-learned molecular mechanics force field for the simulation of
protein-ligand systems and beyond [33.54862439531144]
Development of reliable and molecular mechanics (MM) force fields is indispensable for biomolecular simulation and computer-aided drug design.
We introduce a generalized and machine-learned MM force field, ttexttespaloma-0.3, and an end-to-end differentiable framework using graph neural networks.
The force field reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids.
arXiv Detail & Related papers (2023-07-13T23:00:22Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - Basis-independent system-environment coherence is necessary to detect
magnetic field direction in an avian-inspired quantum magnetic sensor [77.34726150561087]
We consider an avian-inspired quantum magnetic sensor composed of two radicals with a third "scavenger" radical under the influence of a collisional environment.
We show that basis-independent coherence, in which the initial system-environment state is non-maximally mixed, is necessary for optimal performance.
arXiv Detail & Related papers (2020-11-30T17:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.