Knowledge Base Completion: Baseline strikes back (Again)
- URL: http://arxiv.org/abs/2005.00804v3
- Date: Mon, 25 Jul 2022 13:41:40 GMT
- Title: Knowledge Base Completion: Baseline strikes back (Again)
- Authors: Prachi Jain, Sushant Rathi, Mausam, Soumen Chakrabarti
- Abstract summary: Knowledge Base Completion (KBC) has been a very active area lately.
Recent developments allow us to use all available negative samples for training.
We show that Complex, when trained using all available negative samples, gives near state-of-the-art performance on all the datasets.
- Score: 36.52445566431404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Base Completion (KBC) has been a very active area lately. Several
recent KBCpapers propose architectural changes, new training methods, or even
new formulations. KBC systems are usually evaluated on standard benchmark
datasets: FB15k, FB15k-237, WN18, WN18RR, and Yago3-10. Most existing methods
train with a small number of negative samples for each positive instance in
these datasets to save computational costs. This paper discusses how recent
developments allow us to use all available negative samples for training. We
show that Complex, when trained using all available negative samples, gives
near state-of-the-art performance on all the datasets. We call this approach
COMPLEX-V2. We also highlight how various multiplicative KBC methods, recently
proposed in the literature, benefit from this train-ing regime and become
indistinguishable in terms of performance on most datasets. Our work calls for
a reassessment of their individual value, in light of these findings.
Related papers
- TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks [30.922069185335246]
We find two common characteristics of tabular data in typical industrial applications that are underrepresented in the datasets usually used for evaluation in the literature.
A considerable portion of datasets in production settings stem from extensive data acquisition and feature engineering pipelines.
This can have an impact on the absolute and relative number of predictive, uninformative, and correlated features compared to academic datasets.
arXiv Detail & Related papers (2024-06-27T17:55:31Z) - Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration [4.919017078893727]
In the underwater domain, most registration of multibeam echo-sounder (MBES) point cloud data are still performed using classical methods.
In this work, we benchmark the performance of 2 classical and 4 learning-based methods.
To the best of our knowledge, this is the first work to benchmark both learning-based and classical registration methods on an AUV-based MBES dataset.
arXiv Detail & Related papers (2024-05-10T07:23:33Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - DataComp: In search of the next generation of multimodal datasets [179.79323076587255]
DataComp is a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl.
Our benchmark consists of multiple compute scales spanning four orders of magnitude.
In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet.
arXiv Detail & Related papers (2023-04-27T11:37:18Z) - Real-Time Evaluation in Online Continual Learning: A New Hope [104.53052316526546]
We evaluate current Continual Learning (CL) methods with respect to their computational costs.
A simple baseline outperforms state-of-the-art CL methods under this evaluation.
This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical.
arXiv Detail & Related papers (2023-02-02T12:21:10Z) - A Novel Dataset for Evaluating and Alleviating Domain Shift for Human
Detection in Agricultural Fields [59.035813796601055]
We evaluate the impact of domain shift on human detection models trained on well known object detection datasets when deployed on data outside the distribution of the training set.
We introduce the OpenDR Humans in Field dataset, collected in the context of agricultural robotics applications, using the Robotti platform.
arXiv Detail & Related papers (2022-09-27T07:04:28Z) - SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained
Language Models [9.063614185765855]
In this paper, we introduce three types of negatives: in-batch negatives, pre-batch negatives, and self-negatives which act as a simple form of hard negatives.
Our proposed model SimKGC can substantially outperform embedding-based methods on several benchmark datasets.
In terms of mean reciprocal rank (MRR), we advance the state-of-the-art by +19% on WN18RR, +6.8% on the Wikidata5M transductive setting, and +22% on the Wikidata5M inductive setting.
arXiv Detail & Related papers (2022-03-04T07:36:30Z) - Exemplar-free Online Continual Learning [7.800379384628357]
Continual learning aims to learn new tasks from sequentially available data under the condition that each data is observed only once by the learner.
Recent works have made remarkable achievements by storing part of learned task data as exemplars for knowledge replay.
We propose a novel exemplar-free method by leveraging nearest-class-mean (NCM) classifier.
arXiv Detail & Related papers (2022-02-11T08:03:22Z) - A Closer Look at Temporal Sentence Grounding in Videos: Datasets and
Metrics [70.45937234489044]
We re- organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions) to make it different from the training split.
We introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores.
All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV.
arXiv Detail & Related papers (2021-01-22T09:59:30Z) - BLEURT: Learning Robust Metrics for Text Generation [17.40369189981227]
We propose BLEURT, a learned evaluation metric based on BERT.
A key aspect of our approach is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize.
BLEURT provides state-of-the-art results on the last three years of the WMT Metrics shared task and the WebNLG Competition dataset.
arXiv Detail & Related papers (2020-04-09T17:26:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.