Related papers: Condensing Graphs via One-Step Gradient Matching

Condensing Graphs via One-Step Gradient Matching

URL: http://arxiv.org/abs/2206.07746v1
Date: Wed, 15 Jun 2022 18:20:01 GMT
Title: Condensing Graphs via One-Step Gradient Matching
Authors: Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, Bin Ying
Abstract summary: We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
Score: 50.07587238142548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. 15x in CIFAR10 for synthesizing 500 graphs).

Related papers

Dataset Distillation as Pushforward Optimal Quantization [1.039189397779466]
We propose a simple extension of the state-of-the-art data distillation method D4M, achieving better performance on the ImageNet-1K dataset with trivial additional computation. We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem. In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors.
arXiv Detail & Related papers (2025-01-13T20:41:52Z)
Bi-Directional Multi-Scale Graph Dataset Condensation via Information Bottleneck [10.680304093708147]
We propose a novel GNN-centric Bi-directional Multi-Scale Graph dataset condensation framework. In this paper, we explore unifying paradigms by operating on both large-to-small and small-to-large for multi-scale graph condensation.
arXiv Detail & Related papers (2024-12-23T07:32:02Z)
Predictive Query-based Pipeline for Graph Data [0.0]
Graph embedding techniques simplify the analysis and processing of large-scale graphs. Several approaches, such as GraphSAGE, Node2Vec, and FastRP, offer efficient methods for generating graph embeddings. By storing embeddings as node properties, it is possible to compare different embedding techniques and evaluate their effectiveness.
arXiv Detail & Related papers (2024-12-13T08:03:57Z)
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching [74.75248610868685]
Teddy is a Taylor-approximated dataset distillation framework designed to handle large-scale dataset. Teddy attains state-of-the-art efficiency and performance on the Tiny-ImageNet and original-sized ImageNet-1K dataset.
arXiv Detail & Related papers (2024-10-10T03:28:46Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Efficiently Learning the Graph for Semi-supervised Learning [4.518012967046983]
We show how to learn the best graphs from the sparse families efficiently using the conjugate gradient method. Our approach can also be used to learn the graph efficiently online with sub-linear regret, under mild smoothness assumptions. We implement our approach and demonstrate significant ($sim$10-100x) speedups over prior work on semi-supervised learning with learned graphs on benchmark datasets.
arXiv Detail & Related papers (2023-06-12T13:22:06Z)
Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues. Our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z)
Delving into Effective Gradient Matching for Dataset Condensation [13.75957901381024]
gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement.
arXiv Detail & Related papers (2022-07-30T21:31:10Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
Convergent Boosted Smoothing for Modeling Graph Data with Tabular Node Features [46.052312251801]
We propose a framework for iterating boosting with graph propagation steps. Our approach is anchored in a principled meta loss function. Across a variety of non-iid graph datasets, our method achieves comparable or superior performance.
arXiv Detail & Related papers (2021-10-26T04:53:12Z)
Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task. Existing methods mostly ignore the context information among tracklets and intra-frame detections. We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z)
Quantizing data for distributed learning [24.46948464551684]
We consider machine learning applications that train a model by leveraging data over a network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of updates, but as models become larger, so does the size of the dataset. In paper, we propose that quantizes data instead of over gradient updates and can support learning applications.
arXiv Detail & Related papers (2020-12-14T19:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.