Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
- URL: http://arxiv.org/abs/2512.07064v1
- Date: Mon, 08 Dec 2025 00:52:46 GMT
- Title: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
- Authors: Jiannan Yang, Veronika Thost, Tengfei Ma,
- Abstract summary: Self-supervised learning plays a central role in molecular representation learning.<n>Recent innovations in masking-based pretraining are introduced as obscurings and lack principled evaluation.<n>This work cast the entire pretrain-finetune workflow into a unified probabilistic framework.
- Score: 11.43518417965958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack principled evaluation, obscuring which design choices are genuinely effective. This work cast the entire pretrain-finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no consistent benefit over uniform sampling for common node-level prediction tasks. Instead, the choice of prediction target and its synergy with the encoder architecture are far more critical. Specifically, shifting to semantically richer targets yields substantial downstream improvements, particularly when paired with expressive Graph Transformer encoders. These insights offer practical guidance for developing more effective SSL methods for molecular graphs.
Related papers
- From Attribution to Action: Jointly ALIGNing Predictions and Explanations [7.1383591932321115]
We propose ALIGN, a novel framework that jointly trains a classifier and a masker in an iterative manner.<n>By leveraging high-quality masks as guidance, ALIGN improves both interpretability and generalizability, showing its superiority across various settings.
arXiv Detail & Related papers (2025-11-10T10:52:17Z) - Learning an Ensemble Token from Task-driven Priors in Facial Analysis [6.1218317445177135]
We introduce ET-Fuser, a novel methodology for learning ensemble token.<n>We propose a robust prior unification learning method that generates a ensemble token within a self-attention mechanism.<n>Our results show improvements across a variety of facial analysis, with statistically significant enhancements observed in the feature representations.
arXiv Detail & Related papers (2025-07-02T02:07:31Z) - Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding [56.565200973244146]
Agentic Predictor is a lightweight predictor for efficient agentic workflow evaluation.<n>By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations.
arXiv Detail & Related papers (2025-05-26T09:46:50Z) - Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z) - How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks [14.338754598043968]
Two competing paradigms exist for self-supervised learning of data representations.
Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other.
arXiv Detail & Related papers (2024-07-03T19:43:12Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with
Masked Autoencoders [7.133110402648305]
This study explores the application of self-supervised learning to the task of motion forecasting.
Forecast-MAE is an extension of the mask autoencoders framework that is specifically designed for self-supervised learning of the motion forecasting task.
arXiv Detail & Related papers (2023-08-19T02:27:51Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - RARE: Robust Masked Graph Autoencoder [45.485891794905946]
Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm.
We propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data.
arXiv Detail & Related papers (2023-04-04T03:35:29Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.