Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
- URL: http://arxiv.org/abs/2310.14753v2
- Date: Mon, 15 Jan 2024 02:55:06 GMT
- Title: Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
- Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang
Wang, Tat-Seng Chua
- Abstract summary: Masked graph modeling excels in the self-supervised representation learning of molecular graphs.
We show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.
We propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy.
- Score: 81.05116895430375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked graph modeling excels in the self-supervised representation learning
of molecular graphs. Scrutinizing previous studies, we can reveal a common
scheme consisting of three key components: (1) graph tokenizer, which breaks a
molecular graph into smaller fragments (i.e., subgraphs) and converts them into
tokens; (2) graph masking, which corrupts the graph with masks; (3) graph
autoencoder, which first applies an encoder on the masked graph to generate the
representations, and then employs a decoder on the representations to recover
the tokens of the original graph. However, the previous MGM studies focus
extensively on graph masking and encoder, while there is limited understanding
of tokenizer and decoder. To bridge the gap, we first summarize popular
molecule tokenizers at the granularity of node, edge, motif, and Graph Neural
Networks (GNNs), and then examine their roles as the MGM's reconstruction
targets. Further, we explore the potential of adopting an expressive decoder in
MGM. Our results show that a subgraph-level tokenizer and a sufficiently
expressive decoder with remask decoding have a large impact on the encoder's
representation learning. Finally, we propose a novel MGM method SimSGT,
featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding
strategy. We empirically validate that our method outperforms the existing
molecule self-supervised learning methods. Our codes and checkpoints are
available at https://github.com/syr-cn/SimSGT.
Related papers
- GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent
Space Reconstruction [76.35904458027694]
Masked autoencoder models lack good generalization ability on graph data.
We propose a novel graph masked autoencoder framework called GiGaMAE.
Our results will shed light on the design of foundation models on graph-structured data.
arXiv Detail & Related papers (2023-08-18T16:30:51Z) - Measuring the Privacy Leakage via Graph Reconstruction Attacks on
Simplicial Neural Networks (Student Abstract) [25.053461964775778]
We study whether graph representations can be inverted to recover the graph used to generate them via graph reconstruction attack (GRA)
We propose a GRA that recovers a graph's adjacency matrix from the representations via a graph decoder.
We find that the SNN outputs reveal the lowest privacy-preserving ability to defend the GRA.
arXiv Detail & Related papers (2023-02-08T23:40:24Z) - What's Behind the Mask: Understanding Masked Graph Modeling for Graph
Autoencoders [32.42097625708298]
MaskGAE is a self-supervised learning framework for graph-structured data.
MGM is a principled pretext task - masking a portion of edges and attempting to reconstruct the missing part with partially visible, unmasked graph structure.
We establish close connections between GAEs and contrastive learning, showing that MGM significantly improves the self-supervised learning scheme of GAEs.
arXiv Detail & Related papers (2022-05-20T09:45:57Z) - MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs [55.66953093401889]
Masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data.
Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training.
arXiv Detail & Related papers (2022-01-07T16:48:07Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Learning Graphon Autoencoders for Generative Graph Modeling [91.32624399902755]
Graphon is a nonparametric model that generates graphs with arbitrary sizes and can be induced from graphs easily.
We propose a novel framework called textitgraphon autoencoder to build an interpretable and scalable graph generative model.
A linear graphon factorization model works as a decoder, leveraging the latent representations to reconstruct the induced graphons.
arXiv Detail & Related papers (2021-05-29T08:11:40Z) - AEGCN: An Autoencoder-Constrained Graph Convolutional Network [5.023274927781062]
We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network.
The core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder.
We show that adding autoencoder constraints significantly improves the performance of graph convolutional networks.
arXiv Detail & Related papers (2020-07-03T16:42:55Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.