Related papers: Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings

Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings

URL: http://arxiv.org/abs/2207.04979v1
Date: Mon, 11 Jul 2022 16:07:16 GMT
Title: Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings
Authors: Adrian Kochsiek, Fritz Niesel, Rainer Gemulla
Abstract summary: We introduce an efficient multi-fidelity HPO algorithm for large-scale knowledge graphs. GraSH obtains state-of-the-art results on large graphs at a low cost.
Score: 4.3400407844815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge graph embedding (KGE) models are an effective and popular approach to represent and reason with multi-relational data. Prior studies have shown that KGE models are sensitive to hyperparameter settings, however, and that suitable choices are dataset-dependent. In this paper, we explore hyperparameter optimization (HPO) for very large knowledge graphs, where the cost of evaluating individual hyperparameter configurations is excessive. Prior studies often avoided this cost by using various heuristics; e.g., by training on a subgraph or by using fewer epochs. We systematically discuss and evaluate the quality and cost savings of such heuristics and other low-cost approximation techniques. Based on our findings, we introduce GraSH, an efficient multi-fidelity HPO algorithm for large-scale KGEs that combines both graph and epoch reduction techniques and runs in multiple rounds of increasing fidelities. We conducted an experimental study and found that GraSH obtains state-of-the-art results on large graphs at a low cost (three complete training runs in total).

Related papers

Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs [2.4134741591214808]
We introduce Dual Graph Attention-based Knowledge Tracing (DGAKT)<n>It is a graph neural network model designed to leverage high-order information from subgraphs representing student-exercise-KC relationships.<n>It significantly reduces memory and computational requirements compared to full global graph models.
arXiv Detail & Related papers (2025-07-24T06:12:43Z)
An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [56.58170370127227]
We show that optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes. This work is the first work that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition [54.45837774534411]
We introduce textbfG-OSR, a benchmark for evaluating Graph Open-Set Recognition (GOSR) methods at both the node and graph levels. Results offer critical insights into the generalizability and limitations of current GOSR methods.
arXiv Detail & Related papers (2025-03-01T13:02:47Z)
Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure [2.2690868277262486]
The Topologically-Weighted Intelligence Generation (TWIG) model has been proposed as a solution to modelling how each of these elements relate. We extend the previous research on TWIG and evaluate its ability to simulate the output of the KGE model ComplEx in the cross-KG setting.
arXiv Detail & Related papers (2024-12-19T12:47:21Z)
Scaling Laws for Sparsely-Connected Foundation Models [70.41266138010657]
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets. We identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data.
arXiv Detail & Related papers (2023-09-15T16:29:27Z)
A Generalized EigenGame with Extensions to Multiview Representation Learning [0.28647133890966997]
Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. We develop an approach to solving GEPs in which all constraints are softly enforced by Lagrange multipliers. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case. We demonstrate the effectiveness of our method for solving GEPs in the setting of canonical multiview datasets.
arXiv Detail & Related papers (2022-11-21T10:11:13Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search. We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z)
KGTuner: Efficient Hyper-parameter Search for Knowledge Graph Learning [36.97957745114711]
We propose an efficient two-stage search algorithm, which efficiently explores HP configurations on small subgraph. Experiments show that our method can consistently find better HPs than the baseline algorithms within the same time budget.
arXiv Detail & Related papers (2022-05-05T06:09:14Z)
Sliced gradient-enhanced Kriging for high-dimensional function approximation [2.8228516010000617]
Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. It tends to get impractical for high-dimensional problems due to the size of the inherent correlation matrix. A new method, called sliced GE-Kriging (SGE-Kriging), is developed in this paper for reducing the size of the correlation matrix. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs.
arXiv Detail & Related papers (2022-04-05T07:27:14Z)
Gaussian Graphical Model Selection for Huge Data via Minipatch Learning [1.2891210250935146]
We propose the Minipatch Graph (MPGraph) estimator to solve the problem of graphical model selection. MPGraph is a generalization of thresholded graph estimators fit to tiny, random subsets of both the observations and the nodes. We prove that our algorithm achieves finite-sample graph selection consistency.
arXiv Detail & Related papers (2021-10-22T21:06:48Z)
Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems. A theory has shown the importance of the gradient descent (GD) to globally optimal solutions. We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)
Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis [10.154836127889487]
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications. This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes.
arXiv Detail & Related papers (2021-04-10T03:55:45Z)
Model-Agnostic Graph Regularization for Few-Shot Learning [60.64531995451357]
We present a comprehensive study on graph embedded few-shot learning. We introduce a graph regularization approach that allows a deeper understanding of the impact of incorporating graph information between labels. Our approach improves the performance of strong base learners by up to 2% on Mini-ImageNet and 6.7% on ImageNet-FS.
arXiv Detail & Related papers (2021-02-14T05:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.