Data-Centric Learning from Unlabeled Graphs with Diffusion Model
- URL: http://arxiv.org/abs/2303.10108v2
- Date: Thu, 12 Oct 2023 15:24:28 GMT
- Title: Data-Centric Learning from Unlabeled Graphs with Diffusion Model
- Authors: Gang Liu, Eric Inae, Tong Zhao, Jiaxin Xu, Tengfei Luo, Meng Jiang
- Abstract summary: We propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points.
We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process.
Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks.
- Score: 21.417410006246147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph property prediction tasks are important and numerous. While each task
offers a small size of labeled examples, unlabeled graphs have been collected
from various sources and at a large scale. A conventional approach is training
a model with the unlabeled graphs on self-supervised tasks and then fine-tuning
the model on the prediction tasks. However, the self-supervised task knowledge
could not be aligned or sometimes conflicted with what the predictions needed.
In this paper, we propose to extract the knowledge underlying the large set of
unlabeled graphs as a specific set of useful data points to augment each
property prediction model. We use a diffusion model to fully utilize the
unlabeled graphs and design two new objectives to guide the model's denoising
process with each task's labeled data to generate task-specific graph examples
and their labels. Experiments demonstrate that our data-centric approach
performs significantly better than fifteen existing various methods on fifteen
tasks. The performance improvement brought by unlabeled data is visible as the
generated labeled examples unlike the self-supervised learning.
Related papers
- GraphFM: A Scalable Framework for Multi-Graph Pretraining [2.882104808886318]
We introduce a scalable multi-graph multi-task pretraining approach specifically tailored for node classification tasks across diverse graph datasets from different domains.
We demonstrate the efficacy of our approach by training a model on 152 different graph datasets comprising over 7.4 million nodes and 189 million edges.
Our results show that pretraining on a diverse array of real and synthetic graphs improves the model's adaptability and stability, while performing competitively with state-of-the-art specialist models.
arXiv Detail & Related papers (2024-07-16T16:51:43Z) - One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain.
We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges.
OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z) - DiffusAL: Coupling Active Learning with Graph Diffusion for
Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings.
Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria.
Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z) - Few Shot Rationale Generation using Self-Training with Dual Teachers [4.91890875296663]
Self-rationalizing models that also generate a free-text explanation for their predicted labels are an important tool to build trustworthy AI applications.
We introduce a novel dual-teacher learning framework, which learns two specialized teacher models for task prediction and rationalization.
We formulate a new loss function, Masked Label Regularization (MLR) which promotes explanations to be strongly conditioned on predicted labels.
arXiv Detail & Related papers (2023-06-05T23:57:52Z) - Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA)
We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Out-of-Sample Representation Learning for Multi-Relational Graphs [8.956321788625894]
We study the out-of-sample representation learning problem for non-attributed knowledge graphs.
We create benchmark datasets for this task, develop several models and baselines, and provide empirical analyses and comparisons of the proposed models and baselines.
arXiv Detail & Related papers (2020-04-28T00:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.