Combination of digital signal processing and assembled predictive models
facilitates the rational design of proteins
- URL: http://arxiv.org/abs/2010.03516v1
- Date: Wed, 7 Oct 2020 16:35:02 GMT
- Title: Combination of digital signal processing and assembled predictive models
facilitates the rational design of proteins
- Authors: David Medina-Ortiz and Sebastian Contreras and Juan Amado-Hinojosa and
Jorge Torres-Almonacid and Juan A. Asenjo and Marcelo Navarrete and \'Alvaro
Olivera-Nappa
- Abstract summary: Predicting the effect of mutations in proteins is one of the most critical challenges in protein engineering.
We use clustering, embedding, and dimensionality reduction techniques to select combinations of physicochemical properties for the encoding stage.
We then select the best performing predictive models in each set of properties and create an assembled model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the effect of mutations in proteins is one of the most critical
challenges in protein engineering; by knowing the effect a substitution of one
(or several) residues in the protein's sequence has on its overall properties,
could design a variant with a desirable function. New strategies and
methodologies to create predictive models are continually being developed.
However, those that claim to be general often do not reach adequate
performance, and those that aim to a particular task improve their predictive
performance at the cost of the method's generality. Moreover, these approaches
typically require a particular decision to encode the amino acidic sequence,
without an explicit methodological agreement in such endeavor. To address these
issues, in this work, we applied clustering, embedding, and dimensionality
reduction techniques to the AAIndex database to select meaningful combinations
of physicochemical properties for the encoding stage. We then used the chosen
set of properties to obtain several encodings of the same sequence, to
subsequently apply the Fast Fourier Transform (FFT) on them. We perform an
exploratory stage of Machine-Learning models in the frequency space, using
different algorithms and hyperparameters. Finally, we select the best
performing predictive models in each set of properties and create an assembled
model. We extensively tested the proposed methodology on different datasets and
demonstrated that the generated assembled model achieved notably better
performance metrics than those models based on a single encoding and, in most
cases, better than those previously reported. The proposed method is available
as a Python library for non-commercial use under the GNU General Public License
(GPLv3) license.
Related papers
- Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Fourier Representations for Black-Box Optimization over Categorical
Variables [34.0277529502051]
We propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables.
To learn such representations, we consider two different settings to update our surrogate model.
Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods.
arXiv Detail & Related papers (2022-02-08T08:14:58Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Adaptive machine learning for protein engineering [0.4568777157687961]
We discuss how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement.
First, we discuss how to select sequences through a single round of machine-learning optimization.
Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.
arXiv Detail & Related papers (2021-06-10T02:56:35Z) - Evolutionary Variational Optimization of Generative Models [0.0]
We combine two popular optimization approaches to derive learning algorithms for generative models: variational optimization and evolutionary algorithms.
We show that evolutionary algorithms can effectively and efficiently optimize the variational bound.
In the category of "zero-shot" learning, we observed the evolutionary variational algorithm to significantly improve the state-of-the-art in many benchmark settings.
arXiv Detail & Related papers (2020-12-22T19:06:33Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z) - Fast differentiable DNA and protein sequence optimization for molecular
design [0.0]
Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design.
Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples.
The resulting algorithm, which we call Fast SeqPropProp, achieves up to 100-fold faster convergence compared to previous versions.
arXiv Detail & Related papers (2020-05-22T17:03:55Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.