Related papers: Generative AI for Controllable Protein Sequence Design: A Survey

Generative AI for Controllable Protein Sequence Design: A Survey

URL: http://arxiv.org/abs/2402.10516v1
Date: Fri, 16 Feb 2024 09:05:02 GMT
Title: Generative AI for Controllable Protein Sequence Design: A Survey
Authors: Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh and Tingjun Hou
Abstract summary: We systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved. We then offer in-depth reviews of each design task and discuss the pertinent applications.
Score: 2.3502958706414905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particularly in the realm of generative models and optimization algorithms, have been propelling the protein design field towards an unprecedented revolution. In this survey, we systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved and present key generative models and optimization algorithms. We then offer in-depth reviews of each design task and discuss the pertinent applications. Finally, we identify the unresolved challenges and highlight research opportunities that merit deeper exploration.

Related papers

AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model [92.51919604882984]
We introduce AMix-1, a powerful protein foundation model built on Flow Bayesian Networks.<n>AMix-1 is empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm.<n>Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework.
arXiv Detail & Related papers (2025-07-11T17:02:25Z)
A Model-Centric Review of Deep Learning for Protein Design [0.0]
Deep learning has transformed protein design, enabling accurate structure prediction, sequence optimization, and de novo protein generation. Generative models such as ProtGPT2, ProteinMPNN, and RFdiffusion have enabled sequence and backbone design beyond natural evolution-based limitations. More recently, joint sequence-structure co-design models, including ESM3, have integrated both modalities into a unified framework, resulting in improved designability.
arXiv Detail & Related papers (2025-02-26T14:31:21Z)
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence [54.14779179869007]
We highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for data generation, sharing, and analysis.
arXiv Detail & Related papers (2025-02-21T13:20:33Z)
Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm. Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z)
Large Language Model is Secretly a Protein Sequence Optimizer [24.55348363931866]
We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequences.
arXiv Detail & Related papers (2025-01-16T03:44:16Z)
ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design [61.19456204667385]
We introduce ProteinWeaver, a two-stage framework for protein backbone design. ProteinWeaver generates high-quality, novel protein backbones through versatile domain assembly. By introducing a divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design.
arXiv Detail & Related papers (2024-11-08T08:10:49Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Protein Design by Integrating Machine Learning with Quantum Annealing and Quantum-inspired Optimization [0.0]
The protein design problem involves finding polypeptide sequences folding into a given threedimensional structure. Recent machine learning breakthroughs have enabled accurate and rapid structure predictions. We introduce a general protein design scheme where algorithmic and technological advancements in machine learning and quantum-inspired algorithms can be integrated.
arXiv Detail & Related papers (2024-07-09T18:42:45Z)
Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision [76.4345564864002]
Next-generation multiple input multiple output (MIMO) is expected to be intelligent and scalable. We propose the concept of the generative AI agent, which is capable of generating tailored and specialized contents. We present two compelling case studies that demonstrate the effectiveness of leveraging the generative AI agent for performance analysis.
arXiv Detail & Related papers (2024-04-13T02:39:36Z)
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously. xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z)
A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design [54.30457372514873]
We propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design. HTP consists of four levels of training stages, each corresponding to a specific protein modality. Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem.
arXiv Detail & Related papers (2023-10-30T02:39:15Z)
Generative artificial intelligence for de novo protein design [1.2021565114959365]
Generative architectures seem adept at generating novel, yet realistic proteins. Design protocols now achieve experimental success rates nearing 20%. Despite extensive progress, there are clear field-wide challenges.
arXiv Detail & Related papers (2023-10-15T00:02:22Z)
Protein Sequence Design with Batch Bayesian Optimisation [0.0]
Protein sequence design is a challenging problem in protein engineering, which aims to discover novel proteins with useful biological functions. directed evolution is a widely-used approach for protein sequence design, which mimics the evolution cycle in a laboratory environment and conducts an iterative protocol. We propose a new method based on Batch Bayesian Optimization (Batch BO), a well-established optimization method, for protein sequence design.
arXiv Detail & Related papers (2023-03-18T14:53:20Z)
Structured Q-learning For Antibody Design [82.78798397798533]
One of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Applying traditional Reinforcement Learning to antibody design optimization results in poor performance. We propose Q-learning, an extension of Q-learning that incorporates structural priors for optimization.
arXiv Detail & Related papers (2022-09-10T15:36:55Z)
ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution [18.726398852721204]
We propose an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution. ODBO employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection. We conduct and report four protein directed evolution experiments that substantiate the capability of the proposed framework for finding variants with properties of interest.
arXiv Detail & Related papers (2022-05-19T13:21:31Z)
Protein sequence design with deep generative models [0.34410212782758054]
We highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods. Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process.
arXiv Detail & Related papers (2021-04-09T16:08:15Z)
AdaLead: A simple and robust adaptive greedy search algorithm for sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead) AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.