Data-Centric Learning from Unlabeled Graphs with Diffusion Model
- URL: http://arxiv.org/abs/2303.10108v2
- Date: Thu, 12 Oct 2023 15:24:28 GMT
- Title: Data-Centric Learning from Unlabeled Graphs with Diffusion Model
- Authors: Gang Liu, Eric Inae, Tong Zhao, Jiaxin Xu, Tengfei Luo, Meng Jiang
- Abstract summary: We propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points.
We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process.
Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks.
- Score: 21.417410006246147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph property prediction tasks are important and numerous. While each task
offers a small size of labeled examples, unlabeled graphs have been collected
from various sources and at a large scale. A conventional approach is training
a model with the unlabeled graphs on self-supervised tasks and then fine-tuning
the model on the prediction tasks. However, the self-supervised task knowledge
could not be aligned or sometimes conflicted with what the predictions needed.
In this paper, we propose to extract the knowledge underlying the large set of
unlabeled graphs as a specific set of useful data points to augment each
property prediction model. We use a diffusion model to fully utilize the
unlabeled graphs and design two new objectives to guide the model's denoising
process with each task's labeled data to generate task-specific graph examples
and their labels. Experiments demonstrate that our data-centric approach
performs significantly better than fifteen existing various methods on fifteen
tasks. The performance improvement brought by unlabeled data is visible as the
generated labeled examples unlike the self-supervised learning.
Related papers
- Replay-and-Forget-Free Graph Class-Incremental Learning: A Task Profiling and Prompting Approach [28.194940062243003]
Class-incremental learning (CIL) aims to continually learn a sequence of tasks, with each task consisting of a set of unique classes.
The key characteristic of CIL lies in the absence of task identifiers (IDs) during inference.
We show theoretically that accurate task ID prediction on graph data can be achieved by a Laplacian smoothing-based graph task profiling approach.
arXiv Detail & Related papers (2024-10-14T09:54:20Z) - One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain.
We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges.
OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z) - DiffusAL: Coupling Active Learning with Graph Diffusion for
Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings.
Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria.
Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z) - Few Shot Rationale Generation using Self-Training with Dual Teachers [4.91890875296663]
Self-rationalizing models that also generate a free-text explanation for their predicted labels are an important tool to build trustworthy AI applications.
We introduce a novel dual-teacher learning framework, which learns two specialized teacher models for task prediction and rationalization.
We formulate a new loss function, Masked Label Regularization (MLR) which promotes explanations to be strongly conditioned on predicted labels.
arXiv Detail & Related papers (2023-06-05T23:57:52Z) - Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA)
We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Out-of-Sample Representation Learning for Multi-Relational Graphs [8.956321788625894]
We study the out-of-sample representation learning problem for non-attributed knowledge graphs.
We create benchmark datasets for this task, develop several models and baselines, and provide empirical analyses and comparisons of the proposed models and baselines.
arXiv Detail & Related papers (2020-04-28T00:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.