The Fellowship of the Authors: Disambiguating Names from Social Network
Context
- URL: http://arxiv.org/abs/2209.00133v1
- Date: Wed, 31 Aug 2022 21:51:55 GMT
- Title: The Fellowship of the Authors: Disambiguating Names from Social Network
Context
- Authors: Ryan Muther, David Smith
- Abstract summary: Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
- Score: 2.3605348648054454
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most NLP approaches to entity linking and coreference resolution focus on
retrieving similar mentions using sparse or dense text representations. The
common "Wikification" task, for instance, retrieves candidate Wikipedia
articles for each entity mention. For many domains, such as bibliographic
citations, authority lists with extensive textual descriptions for each entity
are lacking and ambiguous named entities mostly occur in the context of other
named entities. Unlike prior work, therefore, we seek to leverage the
information that can be gained from looking at association networks of
individuals derived from textual evidence in order to disambiguate names. We
combine BERT-based mention representations with a variety of graph induction
strategies and experiment with supervised and unsupervised cluster inference
methods. We experiment with data consisting of lists of names from two domains:
bibliographic citations from CrossRef and chains of transmission (isnads) from
classical Arabic histories. We find that in-domain language model pretraining
can significantly improve mention representations, especially for larger
corpora, and that the availability of bibliographic information, such as
publication venue or title, can also increase performance on this task. We also
present a novel supervised cluster inference model which gives competitive
performance for little computational effort, making it ideal for situations
where individuals must be identified without relying on an exhaustive authority
list.
Related papers
- Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition [5.262708162539423]
Few-shot named entity recognition (NER) detects named entities within text using only a few examples.
One promising line of research is to leverage natural language descriptions of each entity type.
In this paper, we explore the impact of a strong semantic prior to interpret verbalizations of new entity types.
arXiv Detail & Related papers (2024-03-21T08:22:44Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Author Name Disambiguation via Heterogeneous Network Embedding from
Structural and Semantic Perspectives [13.266320447769564]
Name ambiguity is common in academic digital libraries, such as multiple authors having the same name.
The proposed method is mainly based on representation learning for heterogeneous networks and clustering.
The semantic representation is generated using NLP tools.
arXiv Detail & Related papers (2022-12-24T11:22:34Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Named entity recognition architecture combining contextual and global
features [5.92351086183376]
Named entity recognition (NER) is an information extraction technique that aims to locate and classify named entities.
We propose the combination of contextual features from XLNet and global features from Graph Convolution Network (GCN) to enhance NER performance.
arXiv Detail & Related papers (2021-12-15T10:54:36Z) - Named Entity Recognition and Linking Augmented with Large-Scale
Structured Data [3.211619859724085]
We describe our submissions to the 2nd and 3rd SlavNER Shared Tasks held at BSNLP 2019 and BSNLP 2021.
The tasks focused on the analysis of Named Entities in multilingual Web documents in Slavic languages with rich inflection.
Our solution takes advantage of large collections of both unstructured and structured documents.
arXiv Detail & Related papers (2021-04-27T20:10:18Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z) - Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous
Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem.
MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework.
Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.