Related papers: Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data

Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data

URL: http://arxiv.org/abs/2601.03930v1
Date: Wed, 07 Jan 2026 13:49:57 GMT
Title: Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data
Authors: Ilann Amiaud-Plachy, Michael Blank, Oliver Bent, Sebastien Boyer,
Abstract summary: Under-utilisation of data in conjunction with deep learning models for protein design may be attributed to; high experimental noise levels and the complex nature of data pre-processing.<n>We propose a novel approach utilising a Bayesian Neural Network within a training loop, in order to simulate the phage display experiment and its associated noise.<n>Our goal is to investigate how understanding the experimental noise and model uncertainty can enable the reliable application of such models to reliably interpret phage display experiments.
Score: 0.4999814847776097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Phage display is a powerful laboratory technique used to study the interactions between proteins and other molecules, whether other proteins, peptides, DNA or RNA. The under-utilisation of this data in conjunction with deep learning models for protein design may be attributed to; high experimental noise levels; the complex nature of data pre-processing; and difficulty interpreting these experimental results. In this work, we propose a novel approach utilising a Bayesian Neural Network within a training loop, in order to simulate the phage display experiment and its associated noise. Our goal is to investigate how understanding the experimental noise and model uncertainty can enable the reliable application of such models to reliably interpret phage display experiments. We validate our approach using actual binding affinity measurements instead of relying solely on proxy values derived from 'held-out' phage display rounds.

Related papers

On the Interpolation Effect of Score Smoothing in Diffusion Models [12.335698325757493]
We study the hypothesis that such creativity arises from an effect caused by a smoothing of the empirical score function.<n>We show theoretically how regularized two-layer ReLU neural networks tend to learn approximately a smoothed version of the empirical score function.<n>We present experimental evidence that learning score functions with neural networks indeed induces a score smoothing effect.
arXiv Detail & Related papers (2025-02-26T19:04:01Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition [64.59093444558549]
We propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real. By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data. Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
arXiv Detail & Related papers (2023-08-08T19:52:28Z)
Moving beyond simulation: data-driven quantitative photoacoustic imaging using tissue-mimicking phantoms [1.5006258585503878]
We introduce a collection of experimentally well-characterised imaging phantoms and their digital twins. This first-of-a-kind phantom data set enables supervised training of a U-Net on experimental data for pixel-wise estimation of absorption coefficients. We show that training on simulated data results in artefacts and biases in the estimates, reinforcing the existence of a domain gap between simulation and experiment.
arXiv Detail & Related papers (2023-06-11T19:12:30Z)
Machine learning enabled experimental design and parameter estimation for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED) Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED. Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z)
Online simulator-based experimental design for cognitive model selection [74.76661199843284]
We propose BOSMOS: an approach to experimental design that can select between computational models without tractable likelihoods. In simulated experiments, we demonstrate that the proposed BOSMOS technique can accurately select models in up to 2 orders of magnitude less time than existing LFI alternatives.
arXiv Detail & Related papers (2023-03-03T21:41:01Z)
Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning [125.99533416395765]
We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection. We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.
arXiv Detail & Related papers (2023-02-06T10:01:38Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
A Robust Backpropagation-Free Framework for Images [47.97322346441165]
We present an error kernel driven activation alignment algorithm for image data. EKDAA accomplishes through the introduction of locally derived error transmission kernels and error maps. Results are presented for an EKDAA trained CNN that employs a non-differentiable activation function.
arXiv Detail & Related papers (2022-06-03T21:14:10Z)
Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
Inference of cell dynamics on perturbation data using adjoint sensitivity [4.606583317143614]
Data-driven dynamic models of cell biology can be used to predict cell response to unseen perturbations. Recent work had demonstrated the derivation of interpretable models with explicit interaction terms. This work aims to extend the range of applicability of this model inference approach to a diversity of biological systems.
arXiv Detail & Related papers (2021-04-13T19:15:56Z)
Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations. We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction [3.499870393443268]
Deep learning approaches can predict drug-protein interactions without trial-and-error by humans. We propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset.
arXiv Detail & Related papers (2020-12-15T10:24:34Z)
Evaluation of synthetic and experimental training data in supervised machine learning applied to charge state detection of quantum dots [0.0]
We evaluate the prediction accuracy of a range of machine learning models trained on simulated and experimental data. We find that classifiers perform best on either purely experimental or a combination of synthetic and experimental training data.
arXiv Detail & Related papers (2020-05-16T23:41:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.