Related papers: On Recovering Higher-order Interactions from Protein Language Models

On Recovering Higher-order Interactions from Protein Language Models

URL: http://arxiv.org/abs/2405.06645v1
Date: Fri, 15 Mar 2024 16:35:47 GMT
Title: On Recovering Higher-order Interactions from Protein Language Models
Authors: Darin Tsui, Amirali Aghazadeh,
Abstract summary: We develop a framework to do a systematic Fourier analysis of the protein language model ESM2 applied on three proteins-green fluorescent protein (GFP), tumor protein P53 (TP53), and G domain B1 (GB1) We demonstrate that ESM2 is dominated by three regions in the sparsity-ruggedness plane, two of which are better suited for sparse Fourier transforms.
Score: 0.3376269351435395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Protein language models leverage evolutionary information to perform state-of-the-art 3D structure and zero-shot variant prediction. Yet, extracting and explaining all the mutational interactions that govern model predictions remains difficult as it requires querying the entire amino acid space for $n$ sites using $20^n$ sequences, which is computationally expensive even for moderate values of $n$ (e.g., $n\sim10$). Although approaches to lower the sample complexity exist, they often limit the interpretability of the model to just single and pairwise interactions. Recently, computationally scalable algorithms relying on the assumption of sparsity in the Fourier domain have emerged to learn interactions from experimental data. However, extracting interactions from language models poses unique challenges: it's unclear if sparsity is always present or if it is the only metric needed to assess the utility of Fourier algorithms. Herein, we develop a framework to do a systematic Fourier analysis of the protein language model ESM2 applied on three proteins-green fluorescent protein (GFP), tumor protein P53 (TP53), and G domain B1 (GB1)-across various sites for 228 experiments. We demonstrate that ESM2 is dominated by three regions in the sparsity-ruggedness plane, two of which are better suited for sparse Fourier transforms. Validations on two sample proteins demonstrate recovery of all interactions with $R^2=0.72$ in the more sparse region and $R^2=0.66$ in the more dense region, using only 7 million out of $20^{10}\sim10^{13}$ ESM2 samples, reducing the computational time by a staggering factor of 15,000. All codes and data are available on our GitHub repository https://github.com/amirgroup-codes/InteractionRecovery.

Related papers

FFT-Accelerated Auxiliary Variable MCMC for Fermionic Lattice Models: A Determinant-Free Approach with $O(N\log N)$ Complexity [52.3171766248012]
We introduce a Markov Chain Monte Carlo (MCMC) algorithm that dramatically accelerates the simulation of quantum many-body systems.<n>We validate our algorithm on benchmark quantum physics problems, accurately reproducing known theoretical results.<n>Our work provides a powerful tool for large-scale probabilistic inference and opens avenues for physics-inspired generative models.
arXiv Detail & Related papers (2025-10-13T07:57:21Z)
Real-time nonlinear inversion of magnetic resonance elastography with operator learning [0.06797079068199119]
The oNLI framework enables real-time MRE inversion (30,000x speedup) of elastograms with comparable spatial accuracy to NLI.<n>A structural prior mechanism, analogous to Soft Prior Regularization in the MRE literature, was incorporated to improve spatial accuracy.
arXiv Detail & Related papers (2025-10-03T08:55:40Z)
Constructive Universal Approximation and Sure Convergence for Multi-Layer Neural Networks [0.0]
o1Neuro is a new neural network model built on sparse indicator activation neurons.<n>At the population level, a deep o1Neuro can approximate any measurable function of $boldsymbolX$.<n>At the sample level, the optimization of o1Neuro reaches an optimal model with probability approaching one after sufficiently many update rounds.
arXiv Detail & Related papers (2025-07-07T08:55:28Z)
Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures [53.51230405648361]
We study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss.<n>This is the first global convergence and recovery result for EM or gradient EM beyond the special case of $m=2$.
arXiv Detail & Related papers (2025-06-06T23:32:38Z)
Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information [1.7419682548187605]
We develop a conditional mutual information tester for Gaussian random variables. We show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information. We also show that when the underlying Gaussian model is not known to be tree-structured, $widetildeTheta(n2varepsilon-2)$ samples are necessary.
arXiv Detail & Related papers (2024-11-18T12:25:34Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning. We propose a new learning paradigm that integrates both paired and unpaired data. Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Model-adapted Fourier sampling for generative compressed sensing [7.130302992490975]
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix. We construct a model-adapted sampling strategy with an improved sample complexity of $textitO(kd| boldsymbolalpha|_22)$ measurements.
arXiv Detail & Related papers (2023-10-08T03:13:16Z)
Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications [7.322121417864824]
We introduce a higher order generalization of the gauge equivariant convolution, dubbed a gauge equivariant Volterra network (GEVNet) This allows us to model spatially extended nonlinear interactions within a given field while still maintaining equivariance to global isometries. In the neuroimaging data experiments, the resulting two-part architecture is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI)
arXiv Detail & Related papers (2023-05-26T06:02:31Z)
Detection-Recovery Gap for Planted Dense Cycles [72.4451045270967]
We consider a model where a dense cycle with expected bandwidth $n tau$ and edge density $p$ is planted in an ErdHos-R'enyi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree algorithms.
arXiv Detail & Related papers (2023-02-13T22:51:07Z)
FeDXL: Provable Federated Learning for Deep X-Risk Optimization [105.17383135458897]
We tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing algorithms are applicable. The challenges for designing an FL algorithm for X-risks lie in the non-decomability of the objective over multiple machines and the interdependency between different machines.
arXiv Detail & Related papers (2022-10-26T00:23:36Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
Supervised deep learning prediction of the formation enthalpy of the full set of configurations in complex phases: the $\sigma-$phase as an example [1.8369974607582582]
We show how machine learning can be used to predict several properties in solid-state chemistry. In particular, it can be used to predict the heat of formation of a given complex crystallographic phase.
arXiv Detail & Related papers (2020-11-21T22:07:15Z)
Machine learning for complete intersection Calabi-Yau manifolds: a methodological study [0.0]
We revisit the question of predicting Hodge numbers $h1,1$ and $h2,1$ of complete Calabi-Yau intersections using machine learning (ML) We obtain 97% (resp. 99%) accuracy for $h1,1$ using a neural network inspired by the Inception model for the old dataset, using only 30% (resp. 70%) of the data for training. For the new one, a simple linear regression leads to almost 100% accuracy with 30% of the data for training.
arXiv Detail & Related papers (2020-07-30T19:43:49Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.