Related papers: Whois? Deep Author Name Disambiguation using Bibliographic Data

Whois? Deep Author Name Disambiguation using Bibliographic Data

URL: http://arxiv.org/abs/2207.04772v1
Date: Mon, 11 Jul 2022 11:03:39 GMT
Title: Whois? Deep Author Name Disambiguation using Bibliographic Data
Authors: Zeyd Boukhers and Nagaraj Asundi Bahubali
Abstract summary: Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
Score: 7.081604594416337
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.

Related papers

A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning. We present three variants of the NER task, together with a dataset to support them. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z)
Deep Author Name Disambiguation using DBLP Data [7.081604594416337]
Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
arXiv Detail & Related papers (2023-03-17T15:50:00Z)
Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author. We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts. Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z)
A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem [0.0]
Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. I try to focus on the research efforts targeted to disambiguate author names.
arXiv Detail & Related papers (2022-11-01T08:22:53Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
The Fellowship of the Authors: Disambiguating Names from Social Network Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z)
Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues. A main challenge is that a person often writes a letter in different styles from time to time. We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z)
Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data [4.817368273632451]
We propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research. Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research. Bib2Auth has shown good performance on a relatively large dataset.
arXiv Detail & Related papers (2021-07-09T12:25:11Z)
Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z)
Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains. We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.