Bib2Auth: Deep Learning Approach for Author Disambiguation using
Bibliographic Data
- URL: http://arxiv.org/abs/2107.04382v1
- Date: Fri, 9 Jul 2021 12:25:11 GMT
- Title: Bib2Auth: Deep Learning Approach for Author Disambiguation using
Bibliographic Data
- Authors: Zeyd Boukhers, Nagaraj Bahubali, Abinaya Thulsi Chandrasekaran, Adarsh
Anand, Soniya Manchenahalli Gnanendra Prasadand, Sriram Aralappa
- Abstract summary: We propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research.
Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research.
Bib2Auth has shown good performance on a relatively large dataset.
- Score: 4.817368273632451
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Author name ambiguity remains a critical open problem in digital libraries
due to synonymy and homonymy of names. In this paper, we propose a novel
approach to link author names to their real-world entities by relying on their
co-authorship pattern and area of research. Our supervised deep learning model
identifies an author by capturing his/her relationship with his/her co-authors
and area of research, which is represented by the titles and sources of the
target author's publications. These attributes are encoded by their semantic
and symbolic representations. To this end, Bib2Auth uses ~ 22K bibliographic
records from the DBLP repository and is trained with each pair of co-authors.
The extensive experiments have proved the capability of the approach to
distinguish between authors sharing the same name and recognize authors with
different name variations. Bib2Auth has shown good performance on a relatively
large dataset, which qualifies it to be directly integrated into bibliographic
indices.
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Deep Author Name Disambiguation using DBLP Data [7.081604594416337]
Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries.
This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
arXiv Detail & Related papers (2023-03-17T15:50:00Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author.
We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts.
Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z) - A Bayesian Learning, Greedy agglomerative clustering approach and
evaluation techniques for Author Name Disambiguation Problem [0.0]
Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names.
I try to focus on the research efforts targeted to disambiguate author names.
arXiv Detail & Related papers (2022-11-01T08:22:53Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Whois? Deep Author Name Disambiguation using Bibliographic Data [7.081604594416337]
Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries.
This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities.
arXiv Detail & Related papers (2022-07-11T11:03:39Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous
Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem.
MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework.
Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.