GraphBPE: Molecular Graphs Meet Byte-Pair Encoding
- URL: http://arxiv.org/abs/2407.19039v1
- Date: Fri, 26 Jul 2024 18:45:09 GMT
- Title: GraphBPE: Molecular Graphs Meet Byte-Pair Encoding
- Authors: Yuchen Shen, Barnabás Póczos,
- Abstract summary: We propose GraphBPE, which tokenizes a molecular graph into different substructures and acts as a preprocessing schedule independent of the model architectures.
Our experiments on 3 graph-level classification and 3 graph-level regression datasets show that data preprocessing could boost the performance of models for molecular graphs.
- Score: 12.985482706851846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing attention to molecular machine learning, various innovations have been made in designing better models or proposing more comprehensive benchmarks. However, less is studied on the data preprocessing schedule for molecular graphs, where a different view of the molecular graph could potentially boost the model's performance. Inspired by the Byte-Pair Encoding (BPE) algorithm, a subword tokenization method popularly adopted in Natural Language Processing, we propose GraphBPE, which tokenizes a molecular graph into different substructures and acts as a preprocessing schedule independent of the model architectures. Our experiments on 3 graph-level classification and 3 graph-level regression datasets show that data preprocessing could boost the performance of models for molecular graphs, and GraphBPE is effective for small classification datasets and it performs on par with other tokenization methods across different model architectures.
Related papers
- TopER: Topological Embeddings in Graph Representation Learning [8.052380377159398]
Topological Evolution Rate (TopER) is a low-dimensional embedding approach grounded in topological data analysis.
TopER simplifies a key topological approach, Persistent Homology, by calculating the evolution rate of graph substructures.
Our models achieve or surpass state-of-the-art results across molecular, biological, and social network datasets in tasks such as classification, clustering, and visualization.
arXiv Detail & Related papers (2024-10-02T17:31:33Z) - Towards Graph Foundation Models: A Survey and Beyond [66.37994863159861]
Foundation models have emerged as critical components in a variety of artificial intelligence applications.
The capabilities of foundation models to generalize and adapt motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm.
This article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies.
arXiv Detail & Related papers (2023-10-18T09:31:21Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - GraphGLOW: Universal and Generalizable Structure Learning for Graph
Neural Networks [72.01829954658889]
This paper introduces the mathematical definition of this novel problem setting.
We devise a general framework that coordinates a single graph-shared structure learner and multiple graph-specific GNNs.
The well-trained structure learner can directly produce adaptive structures for unseen target graphs without any fine-tuning.
arXiv Detail & Related papers (2023-06-20T03:33:22Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Conditional Diffusion Based on Discrete Graph Structures for Molecular
Graph Generation [32.66694406638287]
We propose a Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation.
Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through differential equations (SDE)
We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states.
arXiv Detail & Related papers (2023-01-01T15:24:15Z) - Extreme Acceleration of Graph Neural Network-based Prediction Models for
Quantum Chemistry [7.592530794455257]
We present a novel hardware-software co-design approach to scale up the training of graph neural networks for molecular property prediction.
We introduce an algorithm to coalesce the batches of molecular graphs into fixed size packs to eliminate redundant computation and memory.
We demonstrate that such a co-design approach can reduce the training time of such molecular property prediction models from days to less than two hours.
arXiv Detail & Related papers (2022-11-25T01:30:18Z) - Molecular Graph Representation Learning via Heterogeneous Motif Graph
Construction [19.64574177805823]
We propose a novel molecular graph representation learning method by constructing a heterogeneous motif graph.
In particular, we build a heterogeneous motif graph that contains motif nodes and molecular nodes.
We show that our model achieves similar performances with significantly less computational resources by using our edge sampler.
arXiv Detail & Related papers (2022-02-01T16:21:01Z) - Motif-based Graph Self-Supervised Learning forMolecular Property
Prediction [12.789013658551454]
Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks.
Most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks.
We propose a novel self-supervised motif generation framework for GNNs.
arXiv Detail & Related papers (2021-10-03T11:45:51Z) - Diversified Multiscale Graph Learning with Graph Self-Correction [55.43696999424127]
We propose a diversified multiscale graph learning model equipped with two core ingredients.
A graph self-correction (GSC) mechanism to generate informative embedded graphs, and a diversity boosting regularizer (DBR) to achieve a comprehensive characterization of the input graph.
Experiments on popular graph classification benchmarks show that the proposed GSC mechanism leads to significant improvements over state-of-the-art graph pooling methods.
arXiv Detail & Related papers (2021-03-17T16:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.