Multiple Run Ensemble Learning withLow-Dimensional Knowledge Graph
Embeddings
- URL: http://arxiv.org/abs/2104.05003v1
- Date: Sun, 11 Apr 2021 12:26:50 GMT
- Title: Multiple Run Ensemble Learning withLow-Dimensional Knowledge Graph
Embeddings
- Authors: Chengjin Xu, Mojtaba Nayyeri, Sahar Vahdati, and Jens Lehmann
- Abstract summary: We propose a simple but effective performance boosting strategy for knowledge graph embedding (KGE) models.
We repeat the training of a model 6 times in parallel with an embedding size of 200 and then combine the 6 separate models for testing.
We show that our approach enables different models to better cope with their issues on modeling various graph patterns.
- Score: 4.317340121054659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Among the top approaches of recent years, link prediction using knowledge
graph embedding (KGE) models has gained significant attention for knowledge
graph completion. Various embedding models have been proposed so far, among
which, some recent KGE models obtain state-of-the-art performance on link
prediction tasks by using embeddings with a high dimension (e.g. 1000) which
accelerate the costs of training and evaluation considering the large scale of
KGs. In this paper, we propose a simple but effective performance boosting
strategy for KGE models by using multiple low dimensions in different
repetition rounds of the same model. For example, instead of training a model
one time with a large embedding size of 1200, we repeat the training of the
model 6 times in parallel with an embedding size of 200 and then combine the 6
separate models for testing while the overall numbers of adjustable parameters
are same (6*200=1200) and the total memory footprint remains the same. We show
that our approach enables different models to better cope with their
expressiveness issues on modeling various graph patterns such as symmetric,
1-n, n-1 and n-n. In order to justify our findings, we conduct experiments on
various KGE models. Experimental results on standard benchmark datasets, namely
FB15K, FB15K-237 and WN18RR, show that multiple low-dimensional models of the
same kind outperform the corresponding single high-dimensional models on link
prediction in a certain range and have advantages in training efficiency by
using parallel training while the overall numbers of adjustable parameters are
same.
Related papers
- A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.
We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics [4.220363193932374]
We propose an efficient cosine similarity-based classification difficulty measure S.
It is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset.
We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing.
arXiv Detail & Related papers (2024-04-09T03:27:09Z) - PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory
Access Prediction Models [2.404163279345609]
PaCKD is a pattern-Clustered Knowledge Distillation approach to compress MAP models.
PaCKD yields an 8.70% higher result compared to student models trained with standard knowledge distillation and an 8.88% higher result compared to student models trained without any form of knowledge distillation.
arXiv Detail & Related papers (2024-02-21T00:24:34Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Meta-Ensemble Parameter Learning [35.6391802164328]
In this paper, we study if we can utilize the meta-learning strategy to directly predict the parameters of a single model with comparable performance of an ensemble.
We introduce WeightFormer, a Transformer-based model that can predict student network weights layer by layer in a forward pass.
arXiv Detail & Related papers (2022-10-05T00:47:24Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.