Author Name Disambiguation via Heterogeneous Network Embedding from
Structural and Semantic Perspectives
- URL: http://arxiv.org/abs/2212.12715v1
- Date: Sat, 24 Dec 2022 11:22:34 GMT
- Title: Author Name Disambiguation via Heterogeneous Network Embedding from
Structural and Semantic Perspectives
- Authors: Wenjin Xie, Siyuan Liu, Xiaomeng Wang, Tao Jia
- Abstract summary: Name ambiguity is common in academic digital libraries, such as multiple authors having the same name.
The proposed method is mainly based on representation learning for heterogeneous networks and clustering.
The semantic representation is generated using NLP tools.
- Score: 13.266320447769564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Name ambiguity is common in academic digital libraries, such as multiple
authors having the same name. This creates challenges for academic data
management and analysis, thus name disambiguation becomes necessary. The
procedure of name disambiguation is to divide publications with the same name
into different groups, each group belonging to a unique author. A large amount
of attribute information in publications makes traditional methods fall into
the quagmire of feature selection. These methods always select attributes
artificially and equally, which usually causes a negative impact on accuracy.
The proposed method is mainly based on representation learning for
heterogeneous networks and clustering and exploits the self-attention
technology to solve the problem. The presentation of publications is a
synthesis of structural and semantic representations. The structural
representation is obtained by meta-path-based sampling and a skip-gram-based
embedding method, and meta-path level attention is introduced to automatically
learn the weight of each feature. The semantic representation is generated
using NLP tools. Our proposal performs better in terms of name disambiguation
accuracy compared with baselines and the ablation experiments demonstrate the
improvement by feature selection and the meta-path level attention in our
method. The experimental results show the superiority of our new method for
capturing the most attributes from publications and reducing the impact of
redundant information.
Related papers
- From Open-Vocabulary to Vocabulary-Free Semantic Segmentation [78.62232202171919]
Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data.
Current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications.
This work proposes a Vocabulary-Free Semantic pipeline, eliminating the need for predefined class vocabularies.
arXiv Detail & Related papers (2025-02-17T15:17:08Z) - AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Effect of forename string on author name disambiguation [8.160343645537106]
Author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author.
This study assesses the contributions of forenames in author name disambiguation using multiple labeled datasets.
arXiv Detail & Related papers (2021-02-05T15:54:11Z) - Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous
Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem.
MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework.
Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z) - Improving Domain-Adapted Sentiment Classification by Deep Adversarial
Mutual Learning [51.742040588834996]
Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain.
We propose a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers.
arXiv Detail & Related papers (2020-02-01T01:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.