KinForm: Kinetics Informed Feature Optimised Representation Models for   Enzyme $k_{cat}$ and $K_{M}$ Prediction
        - URL: http://arxiv.org/abs/2507.14639v1
 - Date: Sat, 19 Jul 2025 14:34:57 GMT
 - Title: KinForm: Kinetics Informed Feature Optimised Representation Models for   Enzyme $k_{cat}$ and $K_{M}$ Prediction
 - Authors: Saleh Alwer, Ronan Fleming, 
 - Abstract summary: KinForm is a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters.<n>We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins.
 - Score: 0.0
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Kinetic parameters such as the turnover number ($k_{cat}$) and Michaelis constant ($K_{\mathrm{M}}$) are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality reduction using principal--component analysis (PCA) on concatenated protein features, and rebalance the training data via a similarity-based oversampling strategy. KinForm outperforms baseline methods on two benchmark datasets. Improvements are most pronounced in low sequence similarity bins. We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins. We also find that removing sequence overlap between folds provides a more realistic evaluation of generalisation and should be the standard over random splitting when benchmarking kinetic prediction models. 
 
       
      
        Related papers
        - Self-Boost via Optimal Retraining: An Analysis via Approximate Message   Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv  Detail & Related papers  (2025-05-21T07:16:44Z) - BI-EqNO: Generalized Approximate Bayesian Inference with an Equivariant   Neural Operator Framework [9.408644291433752]
We introduce BI-EqNO, an equivariant neural operator framework for generalized approximate Bayesian inference.
 BI-EqNO transforms priors into posteriors on conditioned observation data through data-driven training.
We demonstrate BI-EqNO's utility through two examples: (1) as a generalized Gaussian process (gGP) for regression, and (2) as an ensemble neural filter (EnNF) for sequential data assimilation.
arXiv  Detail & Related papers  (2024-10-21T18:39:16Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv  Detail & Related papers  (2024-05-01T15:59:00Z) - Perturbative partial moment matching and gradient-flow adaptive   importance sampling transformations for Bayesian leave one out   cross-validation [0.9895793818721335]
We motivate the use of perturbative transformations of the form $T(boldsymboltheta)=boldsymboltheta + h Q(boldsymboltheta),$ for $0hll 1,$.<n>We derive closed-form expressions in the case of logistic regression and shallow ReLU activated neural networks.
arXiv  Detail & Related papers  (2024-02-13T01:03:39Z) - Structured Radial Basis Function Network: Modelling Diversity for
  Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv  Detail & Related papers  (2023-09-02T01:27:53Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
  Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv  Detail & Related papers  (2022-10-27T15:30:52Z) - NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements.
The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively.
We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv  Detail & Related papers  (2022-09-29T16:54:53Z) - Sparse high-dimensional linear regression with a partitioned empirical
  Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv  Detail & Related papers  (2022-09-16T19:15:50Z) - Toward Development of Machine Learned Techniques for Production of
  Compact Kinetic Models [0.0]
Chemical kinetic models are an essential component in the development and optimisation of combustion devices.
We present a novel automated compute intensification methodology to produce overly-reduced and optimised chemical kinetic models.
arXiv  Detail & Related papers  (2022-02-16T12:31:24Z) - Information Theoretic Structured Generative Modeling [13.117829542251188]
A novel generative model framework called the structured generative model (SGM) is proposed that makes straightforward optimization possible.
The implementation employs a single neural network driven by an orthonormal input to a single white noise source adapted to learn an infinite Gaussian mixture model.
Preliminary results show that SGM significantly improves MINE estimation in terms of data efficiency and variance, conventional and variational Gaussian mixture models, as well as for training adversarial networks.
arXiv  Detail & Related papers  (2021-10-12T07:44:18Z) - Gaussian Function On Response Surface Estimation [12.35564140065216]
We propose a new framework for interpreting (features and samples) black-box machine learning models via a metamodeling technique.
The metamodel can be estimated from data generated via a trained complex model by running the computer experiment on samples of data in the region of interest.
arXiv  Detail & Related papers  (2021-01-04T04:47:00Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv  Detail & Related papers  (2020-10-12T03:27:07Z) - Additive interaction modelling using I-priors [0.571097144710995]
We introduce a parsimonious specification of models with interactions, which has two benefits.
It reduces the number of scale parameters and thus facilitates the estimation of models with interactions.
arXiv  Detail & Related papers  (2020-07-30T22:52:22Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.