Related papers: Technical Report of HelixFold3 for Biomolecular Structure Prediction

Technical Report of HelixFold3 for Biomolecular Structure Prediction

URL: http://arxiv.org/abs/2408.16975v3
Date: Mon, 23 Dec 2024 04:57:47 GMT
Title: Technical Report of HelixFold3 for Biomolecular Structure Prediction
Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Jie Gao, Wenlai Zhao, Hongkun Yu, Zhihua Wu, Xiaonan Zhang, Xiaomin Fang,
Abstract summary: The PaddleHelix team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities.<n>The initial release of HelixFold3 is available as open source GitHub for academic research.<n>The latest version will be continuously updated on the HelixFold3 web server, providing both interactive visualization and API access.
Score: 14.111702731349256
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predictions, AlphaFold3 remains partially accessible through a limited online server and has not been open-sourced, restricting further development. To address these challenges, the PaddleHelix team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities. Leveraging insights from previous models and extensive datasets, HelixFold3 achieves accuracy comparable to AlphaFold3 in predicting the structures of the conventional ligands, nucleic acids, and proteins. The initial release of HelixFold3 is available as open source on GitHub for academic research, promising to advance biomolecular research and accelerate discoveries. The latest version will be continuously updated on the HelixFold3 web server, providing both interactive visualization and API access.

Related papers

DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [66.41802970528133]
Molecular structure elucidation from spectra is a foundational problem in chemistry.<n>Traditional methods rely heavily on expert interpretation and lack scalability.<n>We present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data.
arXiv Detail & Related papers (2025-07-09T13:57:20Z)
A Model-Centric Review of Deep Learning for Protein Design [0.0]
Deep learning has transformed protein design, enabling accurate structure prediction, sequence optimization, and de novo protein generation. Generative models such as ProtGPT2, ProteinMPNN, and RFdiffusion have enabled sequence and backbone design beyond natural evolution-based limitations. More recently, joint sequence-structure co-design models, including ESM3, have integrated both modalities into a unified framework, resulting in improved designability.
arXiv Detail & Related papers (2025-02-26T14:31:21Z)
Fast and Accurate Blind Flexible Docking [79.88520988144442]
Molecular docking that predicts the bound structures of small molecules (ligands) to their protein targets plays a vital role in drug discovery. We propose FABFlex, a fast and accurate regression-based multi-task learning model designed for realistic blind flexible docking scenarios.
arXiv Detail & Related papers (2025-02-20T07:31:13Z)
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network. We develop a robust decoder that bridges latent embeddings and molecular structures. Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z)
Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z)
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation. We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z)
HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights [7.702856943171886]
We highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version.
arXiv Detail & Related papers (2024-04-16T03:29:37Z)
DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally. We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z)
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously. xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z)
An Equivariant Generative Framework for Molecular Graph-Structure Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z)
Beating the Best: Improving on AlphaFold2 at Protein Structure Prediction [1.3124513975412255]
ARStack significantly outperforms AlphaFold2 and RoseTTAFold. We rigorously demonstrate this using two sets of non-homologous proteins, and a test set of protein structures published after that of AlphaFold2 and RoseTTAFold.
arXiv Detail & Related papers (2023-01-18T14:39:34Z)
Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv Detail & Related papers (2022-08-20T10:23:17Z)
HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative [61.984700682903096]
HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method pre-trains a large-scale protein language model with thousands of millions of primary sequences. We obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence.
arXiv Detail & Related papers (2022-07-28T07:30:33Z)
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [19.331098164638544]
We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. Compared with the original AlphaFold2 and OpenFold, HelixFold needs only 7.5 days to complete the full end-to-end training. HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets.
arXiv Detail & Related papers (2022-07-12T11:43:50Z)
Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design [70.27706384570723]
We propose Fold2Seq, a novel framework for designing protein sequences conditioned on a specific target fold. We show improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges.
arXiv Detail & Related papers (2021-06-24T14:34:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.