Comprehensive Manuscript Assessment with Text Summarization Using 69707 articles
- URL: http://arxiv.org/abs/2503.20835v1
- Date: Wed, 26 Mar 2025 07:56:15 GMT
- Title: Comprehensive Manuscript Assessment with Text Summarization Using 69707 articles
- Authors: Qichen Sun, Yuxing Lu, Kun Xia, Li Chen, He Sun, Jinzhuo Wang,
- Abstract summary: We harness Scopus to curate a significantly comprehensive and large-scale dataset of information from 69707 scientific articles.<n>We propose a deep learning methodology for the impact-based classification tasks, which leverages semantic features extracted from the manuscripts and paper metadata.
- Score: 10.943765373420135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid and efficient assessment of the future impact of research articles is a significant concern for both authors and reviewers. The most common standard for measuring the impact of academic papers is the number of citations. In recent years, numerous efforts have been undertaken to predict citation counts within various citation windows. However, most of these studies focus solely on a specific academic field or require early citation counts for prediction, rendering them impractical for the early-stage evaluation of papers. In this work, we harness Scopus to curate a significantly comprehensive and large-scale dataset of information from 69707 scientific articles sourced from 99 journals spanning multiple disciplines. We propose a deep learning methodology for the impact-based classification tasks, which leverages semantic features extracted from the manuscripts and paper metadata. To summarize the semantic features, such as titles and abstracts, we employ a Transformer-based language model to encode semantic features and design a text fusion layer to capture shared information between titles and abstracts. We specifically focus on the following impact-based prediction tasks using information of scientific manuscripts in pre-publication stage: (1) The impact of journals in which the manuscripts will be published. (2) The future impact of manuscripts themselves. Extensive experiments on our datasets demonstrate the superiority of our proposed model for impact-based prediction tasks. We also demonstrate potentials in generating manuscript's feedback and improvement suggestions.
Related papers
- CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text [14.279848166377667]
Main text is an important factor for citation count prediction, but it is difficult to handle in machine learning models because the main text is typically very long.
We propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure.
arXiv Detail & Related papers (2024-10-06T08:39:13Z) - From Words to Worth: Newborn Article Impact Prediction with LLM [69.41680520058418]
This paper introduces a promising approach, leveraging the capabilities of LLMs to predict the future impact of newborn articles.<n>The proposed method employs LLM to discern the shared semantic features of highly impactful papers from a large collection of title-abstract pairs.<n>The quantitative results, with an MAE of 0.216 and an NDCG@20 of 0.901, demonstrate that the proposed approach achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-07T17:52:02Z) - CiteFusion: An Ensemble Framework for Citation Intent Classification Harnessing Dual-Model Binary Couples and SHAP Analyses [1.7812428873698407]
This study introduces CiteFusion, an ensemble framework designed to address the multiclass Citation Intent Classification task.<n>CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite and 76.24% on ACL-ARC.<n>We release a web-based application that classifies citation intents leveraging CiteFusion models developed on SciCite.
arXiv Detail & Related papers (2024-07-18T09:29:33Z) - The Effect of Metadata on Scientific Literature Tagging: A Cross-Field
Cross-Model Study [29.965010251365946]
We systematically study the effect of metadata on scientific literature tagging across 19 fields.
We observe some ubiquitous patterns of metadata's effects across all fields.
arXiv Detail & Related papers (2023-02-07T09:34:41Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Automatic Analysis of Linguistic Features in Journal Articles of
Different Academic Impacts with Feature Engineering Techniques [0.975434908987426]
This study attempts to extract micro-level linguistic features in high- and moderate-impact journal RAs, using feature engineering methods.
We extracted 25 highly relevant features from the Corpus of English Journal Articles through feature selection methods.
Results showed that 24 linguistic features such as the overlapping of content words between adjacent sentences, the use of third-person pronouns, auxiliary verbs, tense, emotional words provide consistent and accurate predictions for journal articles with different academic impacts.
arXiv Detail & Related papers (2021-11-15T03:56:50Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Machine Identification of High Impact Research through Text and Image
Analysis [0.4737991126491218]
We present a system to automatically separate papers with a high from those with a low likelihood of gaining citations.
Our system uses both a visual classifier, useful for surmising a document's overall appearance, and a text classifier, for making content-informed decisions.
arXiv Detail & Related papers (2020-05-20T19:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.