Related papers: Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

URL: http://arxiv.org/abs/2310.14753v2
Date: Mon, 15 Jan 2024 02:55:06 GMT
Title: Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
Abstract summary: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. We show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. We propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy.
Score: 81.05116895430375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.

Related papers

Auto-encoding Molecules: Graph-Matching Capabilities Matter [0.0]
We show the effect of graph matching precision on the training behavior and generation capabilities of a Variational Autoencoder (VAE) We propose a transformer-based message passing graph decoder as an alternative to a graph neural network decoder. We show that the precision of graph matching has significant impact on training behavior and is essential for effective de novo (molecular) graph generation.
arXiv Detail & Related papers (2025-03-01T10:00:37Z)
Neural Graph Pattern Machine [50.78679002846741]
We propose the Neural Graph Pattern Machine (GPM), a framework designed to learn directly from graph patterns. GPM efficiently extracts and encodes substructures while identifying the most relevant ones for downstream tasks.
arXiv Detail & Related papers (2025-01-30T20:37:47Z)
GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction [76.35904458027694]
Masked autoencoder models lack good generalization ability on graph data. We propose a novel graph masked autoencoder framework called GiGaMAE. Our results will shed light on the design of foundation models on graph-structured data.
arXiv Detail & Related papers (2023-08-18T16:30:51Z)
Measuring the Privacy Leakage via Graph Reconstruction Attacks on Simplicial Neural Networks (Student Abstract) [25.053461964775778]
We study whether graph representations can be inverted to recover the graph used to generate them via graph reconstruction attack (GRA) We propose a GRA that recovers a graph's adjacency matrix from the representations via a graph decoder. We find that the SNN outputs reveal the lowest privacy-preserving ability to defend the GRA.
arXiv Detail & Related papers (2023-02-08T23:40:24Z)
What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders [32.42097625708298]
MaskGAE is a self-supervised learning framework for graph-structured data. MGM is a principled pretext task - masking a portion of edges and attempting to reconstruct the missing part with partially visible, unmasked graph structure. We establish close connections between GAEs and contrastive learning, showing that MGM significantly improves the self-supervised learning scheme of GAEs.
arXiv Detail & Related papers (2022-05-20T09:45:57Z)
MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs [55.66953093401889]
Masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data. Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training.
arXiv Detail & Related papers (2022-01-07T16:48:07Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
Learning Graphon Autoencoders for Generative Graph Modeling [91.32624399902755]
Graphon is a nonparametric model that generates graphs with arbitrary sizes and can be induced from graphs easily. We propose a novel framework called textitgraphon autoencoder to build an interpretable and scalable graph generative model. A linear graphon factorization model works as a decoder, leveraging the latent representations to reconstruct the induced graphons.
arXiv Detail & Related papers (2021-05-29T08:11:40Z)
AEGCN: An Autoencoder-Constrained Graph Convolutional Network [5.023274927781062]
We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network. The core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder. We show that adding autoencoder constraints significantly improves the performance of graph convolutional networks.
arXiv Detail & Related papers (2020-07-03T16:42:55Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.