Related papers: CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

URL: http://arxiv.org/abs/2205.08012v1
Date: Mon, 16 May 2022 22:55:45 GMT
Title: CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction
Authors: Tara Safavi, Doug Downey, Tom Hope
Abstract summary: We propose a tiered ranking architecture CascadER to maintain the ranking accuracy of full ensembling while improving efficiency considerably. CascadER uses LMs to rerank the outputs of more efficient base KGEs, relying on an adaptive subset selection scheme aimed at invoking the LMs minimally while maximizing accuracy gain over the KGE. Our empirical analyses reveal that diversity of models across modalities and preservation of individual models' confidence signals help explain the effectiveness of CascadER.
Score: 22.96768147978534
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine. Recently, promising results have been achieved by leveraging cross-modal information in KGs, using ensembles that combine knowledge graph embeddings (KGEs) and contextual language models (LMs). However, existing ensembles are either (1) not consistently effective in terms of ranking accuracy gains or (2) impractically inefficient on larger datasets due to the combinatorial explosion problem of pairwise ranking with deep language models. In this paper, we propose a novel tiered ranking architecture CascadER to maintain the ranking accuracy of full ensembling while improving efficiency considerably. CascadER uses LMs to rerank the outputs of more efficient base KGEs, relying on an adaptive subset selection scheme aimed at invoking the LMs minimally while maximizing accuracy gain over the KGE. Extensive experiments demonstrate that CascadER improves MRR by up to 9 points over KGE baselines, setting new state-of-the-art performance on four benchmarks while improving efficiency by one or more orders of magnitude over competitive cross-modal baselines. Our empirical analyses reveal that diversity of models across modalities and preservation of individual models' confidence signals help explain the effectiveness of CascadER, and suggest promising directions for cross-modal cascaded architectures. Code and pretrained models are available at https://github.com/tsafavi/cascader.

Related papers

Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z)
MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification [19.476061046309052]
We present a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. Experiments on 10 datasets show superior or competitive performance over state-of-the-art models.
arXiv Detail & Related papers (2024-11-20T03:01:41Z)
Balancing Efficiency vs. Effectiveness and Providing Missing Label Robustness in Multi-Label Stream Classification [3.97048491084787]
We propose a neural network-based approach to high-dimensional multi-label classification. Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment. We adapt our model to an environment with missing labels using a simple yet effective imputation strategy.
arXiv Detail & Related papers (2023-10-01T13:23:37Z)
How to Turn Your Knowledge Graph Embeddings into Generative Models [10.466244652188777]
Some of the most successful knowledge graph embedding (KGE) models for link prediction can be interpreted as energy-based models. This work re-interprets the score functions of these KGEs as circuits. Our interpretation comes with little or no loss of performance for link prediction.
arXiv Detail & Related papers (2023-05-25T11:30:27Z)
Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs) BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs) It combines high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z)
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches. We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation. Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
Deepened Graph Auto-Encoders Help Stabilize and Enhance Link Prediction [11.927046591097623]
Link prediction is a relatively under-studied graph learning task, with current state-of-the-art models based on one- or two-layers of shallow graph auto-encoder (GAE) architectures. In this paper, we focus on addressing a limitation of current methods for link prediction, which can only use shallow GAEs and variational GAEs. Our proposed methods innovatively incorporate standard auto-encoders (AEs) into the architectures of GAEs, where standard AEs are leveraged to learn essential, low-dimensional representations via seamlessly integrating the adjacency information and node features
arXiv Detail & Related papers (2021-03-21T14:43:10Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator. We can employ the meta-learning technique to optimize this label generator. The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.