SciNoBo : A Hierarchical Multi-Label Classifier of Scientific
Publications
- URL: http://arxiv.org/abs/2204.00880v1
- Date: Sat, 2 Apr 2022 15:09:33 GMT
- Title: SciNoBo : A Hierarchical Multi-Label Classifier of Scientific
Publications
- Authors: Nikolaos Gialitsis, Sotiris Kotitsas, Haris Papageorgiou
- Abstract summary: Classifying scientific publications according to Field-of-Science (FoS) is of crucial importance.
We present SciNoBo, a novel classification system of publications to predefined FoS.
In contrast to other works, our system supports assignments of publications to multiple fields by considering their multi-arity potential.
- Score: 0.7305019142196583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifying scientific publications according to Field-of-Science (FoS)
taxonomies is of crucial importance, allowing funders, publishers, scholars,
companies and other stakeholders to organize scientific literature more
effectively. Most existing works address classification either at venue level
or solely based on the textual content of a research publication. We present
SciNoBo, a novel classification system of publications to predefined FoS
taxonomies, leveraging the structural properties of a publication and its
citations and references organised in a multilayer network. In contrast to
other works, our system supports assignments of publications to multiple fields
by considering their multidisciplinarity potential. By unifying publications
and venues under a common multilayer network structure made up of citing and
publishing relationships, classifications at the venue-level can be augmented
with publication-level classifications. We evaluate SciNoBo on a publications'
dataset extracted from Microsoft Academic Graph and we perform a comparative
analysis against a state-of-the-art neural-network baseline. The results reveal
that our proposed system is capable of producing high-quality classifications
of publications.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - An Instance-based Plus Ensemble Learning Method for Classification of Scientific Papers [2.0794749869068005]
This paper introduces a novel approach that combines instance-based learning and ensemble learning techniques for classifying scientific papers.
Experiments show that the proposed classification method is effective and efficient in categorizing papers into various research areas.
arXiv Detail & Related papers (2024-09-21T19:42:15Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - Weakly Supervised Multi-Label Classification of Full-Text Scientific
Papers [29.295941972777978]
We proposeEX, a framework that uses the cross-paper network structure and the in-paper hierarchy structure to classify full-text scientific papers under weak supervision.
A network-aware contrastive fine-tuning module and a hierarchy-aware aggregation module are designed to leverage the two types of structural signals.
arXiv Detail & Related papers (2023-06-24T15:27:55Z) - Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning [15.915719490494876]
This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract.
It distinguishes 44 disciplines, 718 fields and 1,485 subfields among 160 million abstract snippets in Microsoft Academic Graph.
The classification accuracy is > 90% in 77.13% and 78.19% of the single-label and multi-label classifications, respectively.
arXiv Detail & Related papers (2023-02-01T11:59:17Z) - Hierarchical Multi-Label Classification of Scientific Documents [47.293189105900524]
We introduce a new dataset for hierarchical multi-label text classification of scientific papers called SciHTC.
This dataset contains 186,160 papers and 1,233 categories from the ACM CCS tree.
Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities.
arXiv Detail & Related papers (2022-11-05T04:12:57Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Analyzing Scientific Publications using Domain-Specific Word Embedding
and Topic Modelling [0.6308539010172307]
This paper presents a framework for conducting scientific analyses of academic publications.
It combines various techniques of Natural Language Processing, such as word embedding and topic modelling.
We propose two novel scientific publication embedding, i.e., PUB-G and PUB-W, which are capable of learning semantic meanings of general as well as domain-specific words.
arXiv Detail & Related papers (2021-12-24T04:25:34Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.