Binary classification of proteins by a Machine Learning approach
- URL: http://arxiv.org/abs/2111.01975v1
- Date: Wed, 3 Nov 2021 01:58:16 GMT
- Title: Binary classification of proteins by a Machine Learning approach
- Authors: Damiano Perri, Marco Simonetti, Andrea Lombardi, Noelia Faginas-Lago,
Osvaldo Gervasi
- Abstract summary: We present a system capable of classifying protein chains of amino acids based on the protein description contained in the Protein Data Bank.
Each protein is fully described in its chemical-physical-geometric properties in a file in XML format.
The aim of the work is to design a Deep Learning machinery for the collection and management of a huge amount of data and to validate it through its application to the classification of a sequences of amino acids.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we present a system based on a Deep Learning approach, by using
a Convolutional Neural Network, capable of classifying protein chains of amino
acids based on the protein description contained in the Protein Data Bank. Each
protein is fully described in its chemical-physical-geometric properties in a
file in XML format. The aim of the work is to design a prototypical Deep
Learning machinery for the collection and management of a huge amount of data
and to validate it through its application to the classification of a sequences
of amino acids. We envisage applying the described approach to more general
classification problems in biomolecules, related to structural properties and
similarities.
Related papers
- Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance? [4.7077642423577775]
We propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation.
Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains.
arXiv Detail & Related papers (2024-06-28T08:54:37Z) - Clustering for Protein Representation Learning [72.72957540484664]
We propose a neural clustering framework that can automatically discover the critical components of a protein.
Our framework treats a protein as a graph, where each node represents an amino acid and each edge represents a spatial or sequential connection between amino acids.
We evaluate on four protein-related tasks: protein fold classification, enzyme reaction classification, gene term prediction, and enzyme commission number prediction.
arXiv Detail & Related papers (2024-03-30T05:51:09Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Deep Learning Methods for Protein Family Classification on PDB
Sequencing Data [0.0]
We demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data.
Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.
arXiv Detail & Related papers (2022-07-14T06:11:32Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - A new method for binary classification of proteins with Machine Learning [0.0]
In this work we set out to find a method to classify protein structures using a Deep Learning methodology.
Our Artificial Intelligence has been trained to recognize complex biomolecule structures extrapolated from the Protein Data Bank (PDB) database and reprocessed as images.
For this purpose various tests have been conducted with pre-trained Convolutional Neural Networks, such as InceptionResNetV2 or InceptionV3, in order to extract significant features from these images and correctly classify the molecule.
arXiv Detail & Related papers (2021-11-03T01:58:34Z) - PersGNN: Applying Topological Data Analysis and Geometric Deep Learning
to Structure-Based Protein Function Prediction [0.07340017786387766]
In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank.
We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis.
arXiv Detail & Related papers (2020-10-30T02:24:35Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.