Boltz is a Strong Baseline for Atom-level Representation Learning
- URL: http://arxiv.org/abs/2602.13249v1
- Date: Mon, 02 Feb 2026 08:11:53 GMT
- Title: Boltz is a Strong Baseline for Atom-level Representation Learning
- Authors: Hyosoon Jang, Hyunjin Seo, Yunhui Jang, Seonghyun Park, Sungsoo Ahn,
- Abstract summary: We investigate the quality of Boltz atom-level representations across diverse small-molecule benchmarks.<n>Our results show that Boltz is competitive with specialized baselines on ADMET property prediction tasks.<n>These findings suggest that the representational capacity of cutting-edge protein-centric models has been underexplored.
- Score: 33.4526823362265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models in molecular learning have advanced along two parallel tracks: protein models, which typically utilize evolutionary information to learn amino acid-level representations for folding, and small-molecule models, which focus on learning atom-level representations for property prediction tasks such as ADMET. Notably, cutting-edge protein-centric models such as Boltz now operate at atom-level granularity for protein-ligand co-folding, yet their atom-level expressiveness for small-molecule tasks remains unexplored. A key open question is whether these protein co-folding models capture transferable chemical physics or rely on protein evolutionary signals, which would limit their utility for small-molecule tasks. In this work, we investigate the quality of Boltz atom-level representations across diverse small-molecule benchmarks. Our results show that Boltz is competitive with specialized baselines on ADMET property prediction tasks and effective for molecular generation and optimization. These findings suggest that the representational capacity of cutting-edge protein-centric models has been underexplored and position Boltz as a strong baseline for atom-level representation learning for small molecules.
Related papers
- Representing local protein environments with atomistic foundation models [6.120694232253299]
We propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs)<n>We show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors.<n>In the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor.
arXiv Detail & Related papers (2025-05-29T11:25:47Z) - An All-Atom Generative Model for Designing Protein Complexes [65.06317264153175]
APM (All-Atom Protein Generative Model) is a model specifically designed for modeling multi-chain proteins.<n>It is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch.<n>It also performs folding and inverse-folding tasks for multi-chain proteins.
arXiv Detail & Related papers (2025-04-17T16:37:41Z) - Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm.<n>Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z) - ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling [32.656601823957345]
ESM-AA (ESM All-Atom) is a novel approach that enables atom-scale and residue-scale unified molecular modeling.
Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks.
arXiv Detail & Related papers (2024-03-05T13:35:41Z) - Pre-Training Protein Bi-level Representation Through Span Mask Strategy On 3D Protein Chains [1.2893576217358405]
We introduce a span mask pre-training strategy on 3D protein chains to learn meaningful representations of both residues and atoms.
This leads to a simple yet effective approach to learning protein representation suitable for diverse downstream tasks.
arXiv Detail & Related papers (2024-02-02T15:07:09Z) - Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability [0.0]
The integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge.<n>We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family.<n>We implemented the Hierarchical Grad-CAM graph Explainer framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization.
arXiv Detail & Related papers (2024-01-29T17:23:25Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - Energy-based models for atomic-resolution protein conformations [88.68597850243138]
We propose an energy-based model (EBM) of protein conformations that operates at atomic scale.
The model is trained solely on crystallized protein data.
An investigation of the model's outputs and hidden representations finds that it captures physicochemical properties relevant to protein energy.
arXiv Detail & Related papers (2020-04-27T20:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.