Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
- URL: http://arxiv.org/abs/2206.02963v3
- Date: Mon, 5 Aug 2024 07:55:34 GMT
- Title: Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
- Authors: Yichen Liu, Jiawei Chen, Defang Chen, Zhehui Zhou, Yan Feng, Can Wang,
- Abstract summary: Confidence-aware Self-Knowledge Distillation learns from the model itself to enhance KGE in a low-dimensional space.
A specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings.
- Score: 20.49583906923656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from the model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.
Related papers
- DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance
and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.
Most existing KD techniques rely on Kullback-Leibler (KL) divergence.
We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Adversarial Robustness of Representation Learning for Knowledge Graphs [7.5765554531658665]
This thesis argues that state-of-the-art Knowledge Graph Embeddings (KGE) models are vulnerable to data poisoning attacks.
Two novel data poisoning attacks are proposed that craft input deletions or additions at training time to subvert the learned model's performance at inference time.
The evaluation shows that the simpler attacks are competitive with or outperform the computationally expensive ones.
arXiv Detail & Related papers (2022-09-30T22:41:22Z) - A Closer Look at Knowledge Distillation with Features, Logits, and
Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another.
This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources.
Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z) - Swift and Sure: Hardness-aware Contrastive Learning for Low-dimensional
Knowledge Graph Embeddings [20.693275018860287]
We propose a novel KGE training framework called Hardness-aware Low-dimensional Embedding (HaLE)
In the limited training time, HaLE can effectively improve the performance and training speed of KGE models.
The HaLE-trained models can obtain a high prediction accuracy after training few minutes and are competitive compared to the state-of-the-art models.
arXiv Detail & Related papers (2022-01-03T10:25:10Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Highly Efficient Knowledge Graph Embedding Learning with Orthogonal
Procrustes Analysis [10.154836127889487]
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications.
This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes.
arXiv Detail & Related papers (2021-04-10T03:55:45Z) - RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding [50.010601631982425]
This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs)
We derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail)
We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph.
arXiv Detail & Related papers (2021-01-25T13:31:29Z) - MulDE: Multi-teacher Knowledge Distillation for Low-dimensional
Knowledge Graph Embeddings [22.159452429209463]
Link prediction based on knowledge graph embeddings (KGE) aims to predict new triples to automatically construct knowledge graphs (KGs)
Recent KGE models achieve performance improvements by excessively increasing the embedding dimensions.
We propose MulDE, a novel knowledge distillation framework, which includes multiple low-dimensional hyperbolic KGE models as teachers and two student components.
arXiv Detail & Related papers (2020-10-14T15:09:27Z) - Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A)
In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them.
Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.