Bio2Token: All-atom tokenization of any biomolecular structure with Mamba
- URL: http://arxiv.org/abs/2410.19110v1
- Date: Thu, 24 Oct 2024 19:23:09 GMT
- Title: Bio2Token: All-atom tokenization of any biomolecular structure with Mamba
- Authors: Andrew Liu, Axel Elaldi, Nathan Russell, Olivia Viessmann,
- Abstract summary: We develop quantized auto-encoders that learn atom-level tokenizations of complete proteins, RNA and small molecule structures with reconstruction accuracies below and around 1 Angstrom.
We demonstrate that the Mamba state space model architecture employed is comparatively efficient, requiring a fraction of the training data, parameters and compute needed to reach competitive accuracies and can scale to systems with almost 100,000 atoms.
- Score: 3.039173168183899
- License:
- Abstract: Efficient encoding and representation of large 3D molecular structures with high fidelity is critical for biomolecular design applications. Despite this, many representation learning approaches restrict themselves to modeling smaller systems or use coarse-grained approximations of the systems, for example modeling proteins at the resolution of amino acid residues rather than at the level of individual atoms. To address this, we develop quantized auto-encoders that learn atom-level tokenizations of complete proteins, RNA and small molecule structures with reconstruction accuracies below and around 1 Angstrom. We demonstrate that the Mamba state space model architecture employed is comparatively efficient, requiring a fraction of the training data, parameters and compute needed to reach competitive accuracies and can scale to systems with almost 100,000 atoms. The learned structure tokens of bio2token may serve as the input for all-atom language models in the future.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - CryoChains: Heterogeneous Reconstruction of Molecular Assembly of
Semi-flexible Chains from Cryo-EM Images [3.0828074702828623]
We propose CryoChains that encodes large deformations of biomolecules via rigid body transformation of their chains.
Our data experiments on the human GABAtextsubscriptB and heat shock protein show that CryoChains gives a biophysically-grounded quantification of the heterogeneous conformations of biomolecules.
arXiv Detail & Related papers (2023-06-12T17:57:12Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - Heterogeneous reconstruction of deformable atomic models in Cryo-EM [30.864688165021054]
We describe a heterogeneous reconstruction method based on an atomistic representation whose deformation is reduced to a handful of collective motions.
We show for each distribution that our approach is able to recapitulate the intermediate atomic models with atomic-level accuracy.
arXiv Detail & Related papers (2022-09-29T22:35:35Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Transferring Chemical and Energetic Knowledge Between Molecular Systems
with Machine Learning [5.27145343046974]
We propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one.
We focus on the classification of high and low free-energy states.
Our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system.
arXiv Detail & Related papers (2022-05-06T16:21:00Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - A silicon qubit platform for in situ single molecule structure
determination [0.7187911114620571]
Imaging individual conformational instances of generic, inhomogeneous, transient or intrinsically disordered protein systems at the single molecule level in situ is one of the notable challenges in structural biology.
Here we tackle the problem by designing a single molecule imaging platform technology embracing the advantages silicon-based spin qubits.
We demonstrate through detailed simulation, that this platform enables scalable atomic-level structure-determination of individual molecular systems in native environments.
arXiv Detail & Related papers (2021-12-07T10:42:09Z) - Hierarchical, rotation-equivariant neural networks to select structural
models of protein complexes [6.092214762701847]
We introduce a machine learning method that learns directly from the 3D positions of all atoms to identify accurate models of protein complexes.
Our network substantially improves the identification of accurate structural models among a large set of possible models.
arXiv Detail & Related papers (2020-06-05T20:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.