Generative Pretrained Autoregressive Transformer Graph Neural Network
applied to the Analysis and Discovery of Novel Proteins
- URL: http://arxiv.org/abs/2305.04934v2
- Date: Tue, 11 Jul 2023 12:41:39 GMT
- Title: Generative Pretrained Autoregressive Transformer Graph Neural Network
applied to the Analysis and Discovery of Novel Proteins
- Authors: Markus J. Buehler
- Abstract summary: We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling.
The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks.
We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We report a flexible language-model based deep learning strategy, applied
here to solve complex forward and inverse problems in protein modeling, based
on an attention neural network that integrates transformer and graph
convolutional architectures in a causal multi-headed graph mechanism, to
realize a generative pretrained model. The model is applied to predict
secondary structure content (per-residue level and overall content), protein
solubility, and sequencing tasks. Further trained on inverse tasks, the model
is rendered capable of designing proteins with these properties as target
features. The model is formulated as a general framework, completely
prompt-based, and can be adapted for a variety of downstream tasks. We find
that adding additional tasks yields emergent synergies that the model exploits
in improving overall performance, beyond what would be possible by training a
model on each dataset alone. Case studies are presented to validate the method,
yielding protein designs specifically focused on structural proteins, but also
exploring the applicability in the design of soluble, antimicrobial
biomaterials. While our model is trained to ultimately perform 8 distinct
tasks, with available datasets it can be extended to solve additional problems.
In a broader sense, this work illustrates a form of multiscale modeling that
relates a set of ultimate building blocks (here, byte-level utf8 characters
that define the nature of the physical system at hand) to complex output. This
materiomic scheme captures complex emergent relationships between universal
building block and resulting properties via a synergizing learning capacity to
express a set of potentialities embedded in the knowledge used in training, via
the interplay of universality and diversity.
Related papers
- CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph [66.11279161533619]
CBGBench is a benchmark for structure-based drug design (SBDD)
By categorizing existing methods based on their attributes, CBGBench implements various cutting-edge methods.
We have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks.
arXiv Detail & Related papers (2024-06-16T08:20:24Z) - X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design [0.0]
We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA)
The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations.
We develop a tailored X-LoRA model that offers scientific capabilities including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics and design.
arXiv Detail & Related papers (2024-02-11T10:23:34Z) - ProtAgents: Protein discovery via large language model multi-agent
collaborations combining physics and machine learning [0.0]
ProtAgents is a platform for de novo protein design based on Large Language Models (LLMs)
Multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment.
The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment unleashes great potentials.
arXiv Detail & Related papers (2024-01-27T20:19:49Z) - Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data.
The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database.
PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z) - Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning [2.01243755755303]
We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets.
This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
arXiv Detail & Related papers (2023-08-02T12:08:17Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning multi-scale functional representations of proteins from
single-cell microscopy data [77.34726150561087]
We show that simple convolutional networks trained on localization classification can learn protein representations that encapsulate diverse functional information.
We also propose a robust evaluation strategy to assess quality of protein representations across different scales of biological function.
arXiv Detail & Related papers (2022-05-24T00:00:07Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Multi-Scale Representation Learning on Proteins [78.31410227443102]
This paper introduces a multi-scale graph construction of a protein -- HoloProt.
The surface captures coarser details of the protein, while sequence as primary component and structure captures finer details.
Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level.
arXiv Detail & Related papers (2022-04-04T08:29:17Z) - Incorporating network based protein complex discovery into automated
model construction [6.587739898387445]
We propose a method for gene expression based analysis of cancer phenotypes network incorporating knowledge through unsupervised construction of computational graphs.
The structural construction of the computational graphs is driven by the use of topological clustering algorithms on protein-protein networks.
arXiv Detail & Related papers (2020-09-29T18:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.