BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning
- URL: http://arxiv.org/abs/2510.01632v1
- Date: Thu, 02 Oct 2025 03:25:02 GMT
- Title: BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning
- Authors: Xin Wang, Carlos Oliver,
- Abstract summary: We introduce BioBlobs, a plug-and-play module that represents proteins by dynamically partitioning structures into flexibly-sized substructures ("blobs")<n>The resulting blobs are quantized into a shared and interpretable codebook, yielding a discrete vocabulary of function-relevant protein substructures used to compute protein embeddings.<n>We show that BioBlobs representations improve the performance of widely used protein encoders such as GVP-GNN across various PRL tasks.
- Score: 3.6641231031729173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein function is driven by coherent substructures which vary in size and topology, yet current protein representation learning models (PRL) distort these signals by relying on rigid substructures such as k-hop and fixed radius neighbourhoods. We introduce BioBlobs, a plug-and-play, fully differentiable module that represents proteins by dynamically partitioning structures into flexibly-sized, non-overlapping substructures ("blobs"). The resulting blobs are quantized into a shared and interpretable codebook, yielding a discrete vocabulary of function-relevant protein substructures used to compute protein embeddings. We show that BioBlobs representations improve the performance of widely used protein encoders such as GVP-GNN across various PRL tasks. Our approach highlights the value of architectures that directly capture function-relevant protein substructures, enabling both improved predictive performance and mechanistic insight into protein function.
Related papers
- SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers [50.18388227899971]
We present SaDiT, a novel framework that accelerates protein backbone generation by integrating SaProt Tokenization with a Diffusion Transformer (DiT) architecture.<n>Experiments demonstrate that SaDiT outperforms state-of-the-art models, including RFDiffusion and Proteina, in both computational speed and structural viability.
arXiv Detail & Related papers (2026-02-06T13:50:13Z) - ProteinAE: Protein Diffusion Autoencoders for Structure Encoding [64.77182442408254]
We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder.<n>ProteinAE directly maps protein backbone coordinates from E(3) into a continuous, compact latent space.<n>We demonstrate that ProteinAE achieves state-of-the-art reconstruction quality, outperforming existing autoencoders.
arXiv Detail & Related papers (2025-10-12T14:30:32Z) - Fast and Interpretable Protein Substructure Alignment via Optimal Transport [24.342562385561134]
This study presents PLASMA, the first deep learning framework for efficient and interpretable residue-level protein substructure alignment.<n>We demonstrate that PLASMA achieves accurate, lightweight, and interpretable residue-level alignment.<n>Our method addresses a critical gap in protein structure analysis tools and offers new opportunities for functional annotation, evolutionary studies, and structure-based drug design.
arXiv Detail & Related papers (2025-10-12T10:47:29Z) - Aligning Proteins and Language: A Foundation Model for Protein Retrieval [30.32156711268032]
This paper aims to retrieve proteins with similar structures and semantics from large-scale protein dataset.<n>Motivated by the recent progress of vision-caption models (VLMs), we propose a CLIP-style framework for aligning 3D protein structures with functional annotations.
arXiv Detail & Related papers (2025-05-27T08:13:08Z) - Bidirectional Hierarchical Protein Multi-Modal Representation Learning [4.682021474006426]
Protein language models (pLMs) pretrained on large scale protein sequences have demonstrated significant success in sequence-based tasks.<n> graph neural networks (GNNs) designed to leverage 3D structural information have shown promising generalization in protein-related prediction tasks.<n>We propose a bidirectional and hierarchical (Bi-Hierarchical) fusion approach to capture richer and more comprehensive protein representations.
arXiv Detail & Related papers (2025-04-07T06:47:49Z) - ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals [4.525216077859531]
We introduce the Protein Region Proposal Network (ProteinRPN) for accurate protein function prediction.
ProteinRPN identifies potential functional regions (anchors) which are refined through the hierarchy-aware node drop pooling layer.
The representations of the predicted functional nodes are enriched using attention mechanisms and fed into a Graph Multiset Transformer.
arXiv Detail & Related papers (2024-09-01T04:40:04Z) - ProtFAD: Introducing function-aware domains as implicit modality towards protein function prediction [4.299777426056576]
We propose a function-aware domain representation and a domain-joint contrastive learning strategy to distinguish different protein functions.<n>Our approach significantly and comprehensively outperforms the state-of-the-art methods on various benchmarks.
arXiv Detail & Related papers (2024-05-24T02:26:45Z) - CCPL: Cross-modal Contrastive Protein Learning [47.095862120116976]
We introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL)
CCPL leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning.
We evaluated our model across various benchmarks, demonstrating the framework's superiority.
arXiv Detail & Related papers (2023-03-19T08:19:10Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - PersGNN: Applying Topological Data Analysis and Geometric Deep Learning
to Structure-Based Protein Function Prediction [0.07340017786387766]
In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank.
We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis.
arXiv Detail & Related papers (2020-10-30T02:24:35Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.