Fine-tuning Pre-trained Contextual Embeddings for Citation Content
Analysis in Scholarly Publication
- URL: http://arxiv.org/abs/2009.05836v1
- Date: Sat, 12 Sep 2020 17:46:24 GMT
- Title: Fine-tuning Pre-trained Contextual Embeddings for Citation Content
Analysis in Scholarly Publication
- Authors: Haihua Chen and Huyen Nguyen
- Abstract summary: We propose to fine-tune pre-trained contextual embeddings ULMFiT, BERT, and XLNet for the task.
For citation function identification, the XLNet model achieves 87.2%, 86.90%, and 81.6% on DFKI, UMICH, and TKDE 2019 datasets respectively.
Our method can be used to enhance the influence analysis of scholars and scholarly publications.
- Score: 0.3997680012976965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Citation function and citation sentiment are two essential aspects of
citation content analysis (CCA), which are useful for influence analysis, the
recommendation of scientific publications. However, existing studies are mostly
traditional machine learning methods, although deep learning techniques have
also been explored, the improvement of the performance seems not significant
due to insufficient training data, which brings difficulties to applications.
In this paper, we propose to fine-tune pre-trained contextual embeddings
ULMFiT, BERT, and XLNet for the task. Experiments on three public datasets show
that our strategy outperforms all the baselines in terms of the F1 score. For
citation function identification, the XLNet model achieves 87.2%, 86.90%, and
81.6% on DFKI, UMICH, and TKDE2019 datasets respectively, while it achieves
91.72% and 91.56% on DFKI and UMICH in term of citation sentiment
identification. Our method can be used to enhance the influence analysis of
scholars and scholarly publications.
Related papers
- Why do you cite? An investigation on citation intents and decision-making classification processes [1.7812428873698407]
This study emphasizes the importance of trustfully classifying citation intents.
We present a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC)
One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark.
arXiv Detail & Related papers (2024-07-18T09:29:33Z) - Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models [0.13980986259786224]
This paper presents a comprehensive empirical study focused on identifying persuasive techniques in Arabic social media content.
We utilize Pre-trained Language Models (PLMs) and leverage the ArAlEval dataset.
Our study explores three different learning approaches by harnessing the power of PLMs.
arXiv Detail & Related papers (2024-05-21T15:55:09Z) - Enriched BERT Embeddings for Scholarly Publication Classification [0.13654846342364302]
The NSLP 2024 FoRC Task I addresses this challenge organized as a competition.
The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.
arXiv Detail & Related papers (2024-05-07T09:05:20Z) - An Anchor Learning Approach for Citation Field Learning [23.507104046870186]
We propose a novel algorithm, CIFAL, to boost the citation field learning performance.
Experiments demonstrate that CIFAL outperforms state-of-the-art methods in citation field learning.
arXiv Detail & Related papers (2023-09-07T08:42:40Z) - Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts.
While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - ImpactCite: An XLNet-based method for Citation Impact Analysis [4.526582372434088]
Impact analysis enables us to quantify the quality of the citations.
XLNet-based solution ImpactCite achieves a new state-of-the-art performance for both citation intent and sentiment classification.
Additional efforts have been performed to come up with CSC-Clean corpus, which is a clean and reliable dataset for citation sentiment classification.
arXiv Detail & Related papers (2020-05-05T08:31:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.