Condensing Graphs via One-Step Gradient Matching
- URL: http://arxiv.org/abs/2206.07746v1
- Date: Wed, 15 Jun 2022 18:20:01 GMT
- Title: Condensing Graphs via One-Step Gradient Matching
- Authors: Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang,
Jiliang Tang, Bin Ying
- Abstract summary: We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
- Score: 50.07587238142548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As training deep learning models on large dataset takes a lot of time and
resources, it is desired to construct a small synthetic dataset with which we
can train deep learning models sufficiently. There are recent works that have
explored solutions on condensing image datasets through complex bi-level
optimization. For instance, dataset condensation (DC) matches network gradients
w.r.t. large-real data and small-synthetic data, where the network weights are
optimized for multiple steps at each outer iteration. However, existing
approaches have their inherent limitations: (1) they are not directly
applicable to graphs where the data is discrete; and (2) the condensation
process is computationally expensive due to the involved nested optimization.
To bridge the gap, we investigate efficient dataset condensation tailored for
graph datasets where we model the discrete graph structure as a probabilistic
model. We further propose a one-step gradient matching scheme, which performs
gradient matching for only one single step without training the network
weights. Our theoretical analysis shows this strategy can generate synthetic
graphs that lead to lower classification loss on real graphs. Extensive
experiments on various graph datasets demonstrate the effectiveness and
efficiency of the proposed method. In particular, we are able to reduce the
dataset size by 90% while approximating up to 98% of the original performance
and our method is significantly faster than multi-step gradient matching (e.g.
15x in CIFAR10 for synthesizing 500 graphs).
Related papers
- Dataset Distillation as Pushforward Optimal Quantization [1.039189397779466]
We propose a simple extension of the state-of-the-art data distillation method D4M, achieving better performance on the ImageNet-1K dataset with trivial additional computation.
We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem.
In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - Bi-Directional Multi-Scale Graph Dataset Condensation via Information Bottleneck [10.680304093708147]
We propose a novel GNN-centric Bi-directional Multi-Scale Graph dataset condensation framework.
In this paper, we explore unifying paradigms by operating on both large-to-small and small-to-large for multi-scale graph condensation.
arXiv Detail & Related papers (2024-12-23T07:32:02Z) - Predictive Query-based Pipeline for Graph Data [0.0]
Graph embedding techniques simplify the analysis and processing of large-scale graphs.
Several approaches, such as GraphSAGE, Node2Vec, and FastRP, offer efficient methods for generating graph embeddings.
By storing embeddings as node properties, it is possible to compare different embedding techniques and evaluate their effectiveness.
arXiv Detail & Related papers (2024-12-13T08:03:57Z) - Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching [74.75248610868685]
Teddy is a Taylor-approximated dataset distillation framework designed to handle large-scale dataset.
Teddy attains state-of-the-art efficiency and performance on the Tiny-ImageNet and original-sized ImageNet-1K dataset.
arXiv Detail & Related papers (2024-10-10T03:28:46Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues.
Our method achieves state-of-the-art performance on several MOT datasets.
For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z) - Delving into Effective Gradient Matching for Dataset Condensation [13.75957901381024]
gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets.
We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information.
An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement.
arXiv Detail & Related papers (2022-07-30T21:31:10Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.