Deep Author Name Disambiguation using DBLP Data
- URL: http://arxiv.org/abs/2303.10067v1
- Date: Fri, 17 Mar 2023 15:50:00 GMT
- Title: Deep Author Name Disambiguation using DBLP Data
- Authors: Zeyd Boukhers and Nagaraj Bahubali Asundi
- Abstract summary: Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries.
This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
- Score: 7.081604594416337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the academic world, the number of scientists grows every year and so does
the number of authors sharing the same names. Consequently, it challenging to
assign newly published papers to their respective authors. Therefore, Author
Name Ambiguity (ANA) is considered a critical open problem in digital
libraries. This paper proposes an Author Name Disambiguation (AND) approach
that links author names to their real-world entities by leveraging their
co-authors and domain of research. To this end, we use data collected from the
DBLP repository that contains more than 5 million bibliographic records
authored by around 2.6 million co-authors. Our approach first groups authors
who share the same last names and same first name initials. The author within
each group is identified by capturing the relation with his/her co-authors and
area of research, represented by the titles of the validated publications of
the corresponding author. To this end, we train a neural network model that
learns from the representations of the co-authors and titles. We validated the
effectiveness of our approach by conducting extensive experiments on a large
dataset.
Related papers
- Examining Different Research Communities: Authorship Network [0.0]
We collected Google Scholar data for two different research domains in computer science: Data Mining and Software Engineering.
The scholar database resources are powerful for network analysis, data mining, and identify links between authors via authorship network.
arXiv Detail & Related papers (2024-08-24T19:04:02Z) - NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z) - Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author.
We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts.
Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z) - A Bayesian Learning, Greedy agglomerative clustering approach and
evaluation techniques for Author Name Disambiguation Problem [0.0]
Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names.
I try to focus on the research efforts targeted to disambiguate author names.
arXiv Detail & Related papers (2022-11-01T08:22:53Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Whois? Deep Author Name Disambiguation using Bibliographic Data [7.081604594416337]
Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries.
This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
arXiv Detail & Related papers (2022-07-11T11:03:39Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Bib2Auth: Deep Learning Approach for Author Disambiguation using
Bibliographic Data [4.817368273632451]
We propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research.
Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research.
Bib2Auth has shown good performance on a relatively large dataset.
arXiv Detail & Related papers (2021-07-09T12:25:11Z) - Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous
Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem.
MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework.
Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z) - Domain Adaptive Ensemble Learning [141.98192460069765]
We propose a unified framework termed domain adaptive ensemble learning (DAEL) to address both problems.
Experiments on three multi-source UDA and two DG datasets show that DAEL improves the state of the art on both problems, often by significant margins.
arXiv Detail & Related papers (2020-03-16T16:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.