Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction
- URL: http://arxiv.org/abs/2504.03327v1
- Date: Fri, 04 Apr 2025 10:10:18 GMT
- Title: Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction
- Authors: Makoto Takamoto, Daniel OƱoro-Rubio, Wiem Ben Rim, Takashi Maruyama, Bhushan Kotnis,
- Abstract summary: This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding.<n>We propose textscEMU, a novel framework that emphgenerates negative samples satisfying this condition.<n>The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods.
- Score: 7.961256253228863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.
Related papers
- Leveraging Text-to-Image Generation for Handling Spurious Correlation [24.940576844328408]
Deep neural networks trained with Empirical Risk Minimization (ERM) perform well when both training and test data come from the same domain.<n>ERM models may rely on spurious correlations that often exist between labels and irrelevant features of images, making predictions unreliable when those features do not exist.<n>We propose a technique to generate training samples with text-to-image (T2I) diffusion models for addressing the spurious correlation problem.
arXiv Detail & Related papers (2025-03-21T15:28:22Z) - Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance.<n>We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z) - From Overfitting to Robustness: Quantity, Quality, and Variety Oriented Negative Sample Selection in Graph Contrastive Learning [38.87932592059369]
Graph contrastive learning (GCL) aims to contrast positive-negative counterparts to learn the node embeddings.
The variation, quantity, and quality of negative samples compared to positive samples play crucial roles in learning meaningful embeddings for node classification downstream tasks.
This study proposes a novel Cumulative Sample Selection (CSS) algorithm by comprehensively considering negative samples' quality, variations, and quantity.
arXiv Detail & Related papers (2024-06-21T10:47:26Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Continual Test-time Domain Adaptation via Dynamic Sample Selection [38.82346845855512]
This paper proposes a Dynamic Sample Selection (DSS) method for Continual Test-time Domain Adaptation (CTDA)
We apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information.
Our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.
arXiv Detail & Related papers (2023-10-05T06:35:21Z) - Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - Generating Counterfactual Hard Negative Samples for Graph Contrastive
Learning [22.200011046576716]
Graph contrastive learning is a powerful tool for unsupervised graph representation learning.
Recent works usually sample negative samples from the same training batch with the positive samples, or from an external irrelevant graph.
We propose a novel method to utilize textbfCounterfactual mechanism to generate artificial hard negative samples for textbfContrastive learning.
arXiv Detail & Related papers (2022-07-01T02:19:59Z) - MixKG: Mixing for harder negative samples in knowledge graph [33.4379457065033]
Knowledge graph embedding(KGE) aims to represent entities and relations into low-dimensional vectors for many real-world applications.
We introduce an inexpensive but effective method called MixKG to generate harder negative samples for knowledge graphs.
Experiments on two public datasets and four classical KGE methods show MixKG is superior to previous negative sampling algorithms.
arXiv Detail & Related papers (2022-02-19T13:31:06Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Efficient Non-Sampling Knowledge Graph Embedding [21.074002550338296]
We propose a new framework for KG embedding -- Efficient Non-Sampling Knowledge Graph Embedding (NS-KGE)
The basic idea is to consider all of the negative instances in the KG for model learning, and thus to avoid negative sampling.
Experiments on benchmark datasets show that our NS-KGE framework can achieve a better performance on efficiency and accuracy over traditional negative sampling based models.
arXiv Detail & Related papers (2021-04-21T23:36:39Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z) - Reinforced Negative Sampling over Knowledge Graph for Recommendation [106.07209348727564]
We develop a new negative sampling model, Knowledge Graph Policy Network (kgPolicy), which works as a reinforcement learning agent to explore high-quality negatives.
kgPolicy navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender.
arXiv Detail & Related papers (2020-03-12T12:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.