Distill2Vec: Dynamic Graph Representation Learning with Knowledge
Distillation
- URL: http://arxiv.org/abs/2011.05664v1
- Date: Wed, 11 Nov 2020 09:49:24 GMT
- Title: Distill2Vec: Dynamic Graph Representation Learning with Knowledge
Distillation
- Authors: Stefanos Antaris, Dimitrios Rafailidis
- Abstract summary: We propose Distill2Vec, a knowledge distillation strategy to train a compact model with a low number of trainable parameters.
Our experiments with publicly available datasets show the superiority of our proposed model over several state-of-the-art approaches.
- Score: 4.568777157687959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic graph representation learning strategies are based on different
neural architectures to capture the graph evolution over time. However, the
underlying neural architectures require a large amount of parameters to train
and suffer from high online inference latency, that is several model parameters
have to be updated when new data arrive online. In this study we propose
Distill2Vec, a knowledge distillation strategy to train a compact model with a
low number of trainable parameters, so as to reduce the latency of online
inference and maintain the model accuracy high. We design a distillation loss
function based on Kullback-Leibler divergence to transfer the acquired
knowledge from a teacher model trained on offline data, to a small-size student
model for online data. Our experiments with publicly available datasets show
the superiority of our proposed model over several state-of-the-art approaches
with relative gains up to 5% in the link prediction task. In addition, we
demonstrate the effectiveness of our knowledge distillation strategy, in terms
of number of required parameters, where Distill2Vec achieves a compression
ratio up to 7:100 when compared with baseline approaches. For reproduction
purposes, our implementation is publicly available at
https://stefanosantaris.github.io/Distill2Vec.
Related papers
- BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Synthetic data generation method for data-free knowledge distillation in
regression neural networks [0.0]
Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student.
Previous work has proposed a data-free knowledge distillation method where synthetic data are generated using a generator model trained adversarially against the student model.
In this study, we investigate the behavior of various synthetic data generation methods and propose a new synthetic data generation strategy.
arXiv Detail & Related papers (2023-01-11T07:26:00Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - EGAD: Evolving Graph Representation Learning with Self-Attention and
Knowledge Distillation for Live Video Streaming Events [4.332367445046418]
We present a dynamic graph representation learning model on weighted graphs to accurately predict the network capacity of connections between viewers in a live video streaming event.
We propose EGAD, a neural network architecture to capture the graph evolution by introducing a self-attention mechanism on the weights between consecutive graph convolutional networks.
arXiv Detail & Related papers (2020-11-11T11:16:52Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.