Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence
- URL: http://arxiv.org/abs/2502.15867v1
- Date: Fri, 21 Feb 2025 13:20:33 GMT
- Title: Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence
- Authors: Yingying Sun, Jun A, Zhiwei Liu, Rui Sun, Liujia Qian, Samuel H. Payne, Wout Bittremieux, Markus Ralser, Chen Li, Yi Chen, Zhen Dong, Yasset Perez-Riverol, Asif Khan, Chris Sander, Ruedi Aebersold, Juan Antonio Vizcaíno, Jonathan R Krieger, Jianhua Yao, Han Wen, Linfeng Zhang, Yunping Zhu, Yue Xuan, Benjamin Boyang Sun, Liang Qiao, Henning Hermjakob, Haixu Tang, Huanhuan Gao, Yamin Deng, Qing Zhong, Cheng Chang, Nuno Bandeira, Ming Li, Weinan E, Siqi Sun, Yuedong Yang, Gilbert S. Omenn, Yue Zhang, Ping Xu, Yan Fu, Xiaowen Liu, Christopher M. Overall, Yu Wang, Eric W. Deutsch, Luonan Chen, Jürgen Cox, Vadim Demichev, Fuchu He, Jiaxing Huang, Huilin Jin, Chao Liu, Nan Li, Zhongzhi Luan, Jiangning Song, Kaicheng Yu, Wanggen Wan, Tai Wang, Kang Zhang, Le Zhang, Peter A. Bell, Matthias Mann, Bing Zhang, Tiannan Guo,
- Abstract summary: We highlight key areas where AI is driving innovation, from data analysis to new biological insights.<n>These include developing an AI-friendly ecosystem for data generation, sharing, and analysis.
- Score: 54.14779179869007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.
Related papers
- Protein Large Language Models: A Comprehensive Survey [71.65899614084853]
Protein-specific large language models (Protein LLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design.
This work provides the first comprehensive overview of Protein LLMs, covering their architectures, training datasets, evaluation metrics, and diverse applications.
arXiv Detail & Related papers (2025-02-21T19:22:10Z) - Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm.<n>Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering [24.415612744612773]
Proteins are essential to life's processes, underpinning evolution and diversity.
Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development.
Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy.
Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.
This study addresses this gap by incorporating protein family classification into ESM2's training, while a contextual prediction task fine-tunes local
arXiv Detail & Related papers (2024-04-24T11:09:43Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Growing ecosystem of deep learning methods for modeling
protein$\unicode{x2013}$protein interactions [0.0]
We discuss the growing ecosystem of deep learning methods for modeling protein interactions.
Opportunities abound to discover novel interactions, modulate their physical mechanisms, and engineer binders to unravel their functions.
arXiv Detail & Related papers (2023-10-10T15:53:27Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Subcellular Protein Localisation in the Human Protein Atlas using
Ensembles of Diverse Deep Architectures [11.41081495236219]
Automated visual localisation of subcellular proteins can accelerate our understanding of cell function in health and disease.
We show how this gap can be narrowed by addressing three key aspects: (i) automated improvement of cell annotation quality, (ii) new Convolutional Neural Network (CNN) architectures supporting unbalanced and noisy data, and (iii) informed selection and fusion of multiple & diverse machine learning models.
arXiv Detail & Related papers (2022-05-19T20:28:56Z) - Artificial intelligence techniques for integrative structural biology of
intrinsically disordered proteins [0.3735965959270874]
We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniques for integrative structural biology of intrinsically disordered proteins (IDP)
IDPs challenge the traditional protein structure-function paradigm by adapting their conformations in response to specific binding partners leading them to mediate diverse, and often complex cellular functions such as biological signaling, self organization and compartmentalization.
We posit that scalable statistical inference techniques can effectively integrate information gleaned from multiple experimental techniques as well as from simulations, thus providing access to atomistic details of these emergent phenomena.
arXiv Detail & Related papers (2020-12-01T23:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.