Online Adversarial Distillation for Graph Neural Networks
- URL: http://arxiv.org/abs/2112.13966v1
- Date: Tue, 28 Dec 2021 02:30:11 GMT
- Title: Online Adversarial Distillation for Graph Neural Networks
- Authors: Can Wang, Zhe Wang, Defang Chen, Sheng Zhou, Yan Feng, Chun Chen
- Abstract summary: Knowledge distillation is a technique to improve the model generalization ability on convolutional neural networks.
In this paper, we propose an online adversarial distillation approach to train a group of graph neural networks.
- Score: 40.746598033413086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation has recently become a popular technique to improve the
model generalization ability on convolutional neural networks. However, its
effect on graph neural networks is less than satisfactory since the graph
topology and node attributes are likely to change in a dynamic way and in this
case a static teacher model is insufficient in guiding student training. In
this paper, we tackle this challenge by simultaneously training a group of
graph neural networks in an online distillation fashion, where the group
knowledge plays a role as a dynamic virtual teacher and the structure changes
in graph neural networks are effectively captured. To improve the distillation
performance, two types of knowledge are transferred among the students to
enhance each other: local knowledge reflecting information in the graph
topology and node attributes, and global knowledge reflecting the prediction
over classes. We transfer the global knowledge with KL-divergence as the
vanilla knowledge distillation does, while exploiting the complicated structure
of the local knowledge with an efficient adversarial cyclic learning framework.
Extensive experiments verified the effectiveness of our proposed online
adversarial distillation approach.
Related papers
- Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning [74.6485604326913]
We provide a new semantic enhanced knowledge graph that contains both expert knowledge and categories semantic correlation.
To propagate information on the knowledge graph, we propose a novel Residual Graph Convolutional Network (ResGCN)
Experiments conducted on the widely used large-scale ImageNet-21K dataset and AWA2 dataset show the effectiveness of our method.
arXiv Detail & Related papers (2022-12-26T13:18:36Z) - Dynamic Community Detection via Adversarial Temporal Graph
Representation Learning [17.487265170798974]
In this work, an adversarial temporal graph representation learning framework is proposed to detect dynamic communities from a small sample of brain network data.
In addition, the framework employs adversarial training to guide the learning of temporal graph representation and optimize the measurable modularity loss to maximize the modularity of community.
arXiv Detail & Related papers (2022-06-29T08:44:22Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Being Friends Instead of Adversaries: Deep Networks Learn from Data
Simplified by Other Networks [23.886422706697882]
A different idea has been recently proposed, named Friendly Training, which consists in altering the input data by adding an automatically estimated perturbation.
We revisit and extend this idea inspired by the effectiveness of neural generators in the context of Adversarial Machine Learning.
We propose an auxiliary multi-layer network that is responsible of altering the input data to make them easier to be handled by the classifier.
arXiv Detail & Related papers (2021-12-18T16:59:35Z) - Learning through structure: towards deep neuromorphic knowledge graph
embeddings [0.5906031288935515]
We propose a strategy to map deep graph learning architectures for knowledge graph reasoning to neuromorphic architectures.
Based on the insight that randomly and untrained graph neural networks are able to preserve local graph structures, we compose a frozen neural network shallow knowledge graph embedding models.
We experimentally show that already on conventional computing hardware, this leads to a significant speedup and memory reduction while maintaining a competitive performance level.
arXiv Detail & Related papers (2021-09-21T18:01:04Z) - Knowledge Distillation in Wide Neural Networks: Risk Bound, Data
Efficiency and Imperfect Teacher [40.74624021934218]
Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network.
Recent finding on neural tangent kernel enables us to approximate a wide neural network with a linear model of the network's random features.
arXiv Detail & Related papers (2020-10-20T07:33:21Z) - A Heterogeneous Graph with Factual, Temporal and Logical Knowledge for
Question Answering Over Dynamic Contexts [81.4757750425247]
We study question answering over a dynamic textual environment.
We develop a graph neural network over the constructed graph, and train the model in an end-to-end manner.
arXiv Detail & Related papers (2020-04-25T04:53:54Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.