MulDE: Multi-teacher Knowledge Distillation for Low-dimensional
Knowledge Graph Embeddings
- URL: http://arxiv.org/abs/2010.07152v4
- Date: Thu, 1 Apr 2021 08:09:33 GMT
- Title: MulDE: Multi-teacher Knowledge Distillation for Low-dimensional
Knowledge Graph Embeddings
- Authors: Kai Wang, Yu Liu, Qian Ma, Quan Z. Sheng
- Abstract summary: Link prediction based on knowledge graph embeddings (KGE) aims to predict new triples to automatically construct knowledge graphs (KGs)
Recent KGE models achieve performance improvements by excessively increasing the embedding dimensions.
We propose MulDE, a novel knowledge distillation framework, which includes multiple low-dimensional hyperbolic KGE models as teachers and two student components.
- Score: 22.159452429209463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Link prediction based on knowledge graph embeddings (KGE) aims to predict new
triples to automatically construct knowledge graphs (KGs). However, recent KGE
models achieve performance improvements by excessively increasing the embedding
dimensions, which may cause enormous training costs and require more storage
space. In this paper, instead of training high-dimensional models, we propose
MulDE, a novel knowledge distillation framework, which includes multiple
low-dimensional hyperbolic KGE models as teachers and two student components,
namely Junior and Senior. Under a novel iterative distillation strategy, the
Junior component, a low-dimensional KGE model, asks teachers actively based on
its preliminary prediction results, and the Senior component integrates
teachers' knowledge adaptively to train the Junior component based on two
mechanisms: relation-specific scaling and contrast attention. The experimental
results show that MulDE can effectively improve the performance and training
speed of low-dimensional KGE models. The distilled 32-dimensional model is
competitive compared to the state-of-the-art high-dimensional methods on
several widely-used datasets.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding [36.47838597326351]
We define a metric Z-counts to measure the difficulty of training each triple in knowledge graphs.
Based on this metric, we propose textbfCL4KGE, an efficient textbfCurriculum textbfLearning based training strategy.
arXiv Detail & Related papers (2024-08-27T07:51:26Z) - Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation [8.282123002815805]
Federated Knowledge Graph Embedding (FKGE) aims to facilitate collaborative learning of entity and relation embeddings from distributed Knowledge Graphs.
We propose a light-weight component based on Knowledge Distillation (KD) which is titled FedKD and tailored specifically for FKGE methods.
arXiv Detail & Related papers (2024-08-11T11:15:41Z) - Croppable Knowledge Graph Embedding [34.154096023765916]
Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks.
Once a new dimension is required, a new KGE model needs to be trained from scratch.
We propose a novel KGE training framework MED, through which we could train once to get a croppable KGE model applicable to multiple scenarios.
arXiv Detail & Related papers (2024-07-03T03:10:25Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding [20.49583906923656]
Confidence-aware Self-Knowledge Distillation learns from the model itself to enhance KGE in a low-dimensional space.
A specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings.
arXiv Detail & Related papers (2022-06-07T01:49:22Z) - Swift and Sure: Hardness-aware Contrastive Learning for Low-dimensional
Knowledge Graph Embeddings [20.693275018860287]
We propose a novel KGE training framework called Hardness-aware Low-dimensional Embedding (HaLE)
In the limited training time, HaLE can effectively improve the performance and training speed of KGE models.
The HaLE-trained models can obtain a high prediction accuracy after training few minutes and are competitive compared to the state-of-the-art models.
arXiv Detail & Related papers (2022-01-03T10:25:10Z) - RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding [50.010601631982425]
This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs)
We derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail)
We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph.
arXiv Detail & Related papers (2021-01-25T13:31:29Z) - Dynamic Memory Induction Networks for Few-Shot Text Classification [84.88381813651971]
This paper proposes Dynamic Memory Induction Networks (DMIN) for few-shot text classification.
The proposed model achieves new state-of-the-art results on the miniRCV1 and ODIC dataset, improving the best performance (accuracy) by 24%.
arXiv Detail & Related papers (2020-05-12T12:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.