Unravelling the Architecture of Membrane Proteins with Conditional
Random Fields
- URL: http://arxiv.org/abs/2008.02467v1
- Date: Thu, 6 Aug 2020 05:57:20 GMT
- Title: Unravelling the Architecture of Membrane Proteins with Conditional
Random Fields
- Authors: Lior Lukov, Sanjay Chawla, Wei Liu, Brett Church, and Gaurav Pandey
- Abstract summary: We will show that the Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior.
A comparison on benchmark data sets against twenty-eight other methods shows that the CRF model leads to extremely accurate predictions.
- Score: 11.321552104966326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we will show that the recently introduced graphical model:
Conditional Random Fields (CRF) provides a template to integrate micro-level
information about biological entities into a mathematical model to understand
their macro-level behavior. More specifically, we will apply the CRF model to
an important classification problem in protein science, namely the secondary
structure prediction of proteins based on the observed primary structure. A
comparison on benchmark data sets against twenty-eight other methods shows that
not only does the CRF model lead to extremely accurate predictions but the
modular nature of the model and the freedom to integrate disparate, overlapping
and non-independent sources of information, makes the model an extremely
versatile tool to potentially solve many other problems in bioinformatics.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - A Pipeline for Data-Driven Learning of Topological Features with Applications to Protein Stability Prediction [0.0]
We propose a data-driven method to learn interpretable topological features of biomolecular data.
We compare models that leverage automatically-learned structural features against models trained on a large set of biophysical features determined by subject-matter experts (SME)
Our models, based only on topological features of the protein structures, achieved 92%-99% of the performance of SME-based models in terms of the average precision score.
arXiv Detail & Related papers (2024-08-09T03:52:27Z) - Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data.
The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database.
PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z) - Navigating protein landscapes with a machine-learned transferable
coarse-grained model [29.252004942896875]
coarse-grained (CG) model with similar prediction performance has been a long-standing challenge.
We develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences.
We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins.
arXiv Detail & Related papers (2023-10-27T17:10:23Z) - EigenFold: Generative Protein Structure Prediction with Diffusion Models [10.24107243529341]
EigenFold is a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence.
On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty.
arXiv Detail & Related papers (2023-04-05T02:46:13Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Supervised Learning and Model Analysis with Compositional Data [4.082799056366927]
KernelBiome is a kernel-based non-parametric regression and classification framework for compositional data.
We demonstrate on par or improved performance compared with state-of-the-art machine learning methods.
arXiv Detail & Related papers (2022-05-15T12:33:43Z) - tFold-TR: Combining Deep Learning Enhanced Hybrid Potential Energy for
Template-Based Modelling Structure Refinement [53.98034511648985]
The current template-based modeling approach suffers from two important problems.
The accuracy of the distance pairs from different regions of the template varies, and this information is not well introduced into the modeling.
Two neural network models predict the distance information of the missing regions and the accuracy of the distance pairs of different regions in the template modeling structure.
arXiv Detail & Related papers (2021-05-10T13:32:12Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.