TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
- URL: http://arxiv.org/abs/2204.08173v1
- Date: Mon, 18 Apr 2022 05:54:44 GMT
- Title: TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
- Authors: Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher R\'e
- Abstract summary: We introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval.
TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets.
It is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset.
- Score: 9.745472576444472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity retrieval--retrieving information about entity mentions in a query--is
a key step in open-domain tasks, such as question answering or fact checking.
However, state-of-the-art entity retrievers struggle to retrieve rare entities
for ambiguous mentions due to biases towards popular entities. Incorporating
knowledge graph types during training could help overcome popularity biases,
but there are several challenges: (1) existing type-based retrieval methods
require mention boundaries as input, but open-domain tasks run on unstructured
text, (2) type-based methods should not compromise overall performance, and (3)
type-based methods should be robust to noisy and missing types. In this work,
we introduce TABi, a method to jointly train bi-encoders on knowledge graph
types and unstructured text for entity retrieval for open-domain tasks. TABi
leverages a type-enforced contrastive loss to encourage entities and queries of
similar types to be close in the embedding space. TABi improves retrieval of
rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining
strong overall retrieval performance on open-domain tasks in the KILT benchmark
compared to state-of-the-art retrievers. TABi is also robust to incomplete type
systems, improving rare entity retrieval over baselines with only 5% type
coverage of the training dataset. We make our code publicly available at
https://github.com/HazyResearch/tabi.
Related papers
- Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Entity Type Prediction Leveraging Graph Walks and Entity Descriptions [4.147346416230273]
textitGRAND is a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions.
The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes.
arXiv Detail & Related papers (2022-07-28T13:56:55Z) - Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation
and Instance Generation [36.541309948222306]
We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type.
We propose a novel framework for few-shot FET consisting of two modules: (1) an entity type label interpretation module automatically learns to relate type labels to the vocabulary by jointly leveraging few-shot instances and the label hierarchy, and (2) a type-based contextualized instance generator produces new instances based on given instances to enlarge the training set for better generalization.
arXiv Detail & Related papers (2022-06-28T04:05:40Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Parallel Instance Query Network for Named Entity Recognition [73.30174490672647]
Named entity recognition (NER) is a fundamental task in natural language processing.
Recent works treat named entity recognition as a reading comprehension task, constructing type-specific queries manually to extract entities.
We propose Parallel Instance Query Network (PIQN), which sets up global and learnable instance queries to extract entities in a parallel manner.
arXiv Detail & Related papers (2022-03-20T13:01:25Z) - MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity
Representations [28.28940043641958]
We propose a novel approach for entity retrieval that constructs multi-view representations for entity descriptions and approximates the optimal view for mentions via a searching method.
Our method achieves the state-of-the-art performance on ZESHEL and improves the quality of candidates on three standard Entity Linking datasets.
arXiv Detail & Related papers (2021-09-13T05:51:45Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.