An Instance-based Plus Ensemble Learning Method for Classification of Scientific Papers
- URL: http://arxiv.org/abs/2409.14237v1
- Date: Sat, 21 Sep 2024 19:42:15 GMT
- Title: An Instance-based Plus Ensemble Learning Method for Classification of Scientific Papers
- Authors: Fang Zhang, Shengli Wu,
- Abstract summary: This paper introduces a novel approach that combines instance-based learning and ensemble learning techniques for classifying scientific papers.
Experiments show that the proposed classification method is effective and efficient in categorizing papers into various research areas.
- Score: 2.0794749869068005
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The exponential growth of scientific publications in recent years has posed a significant challenge in effective and efficient categorization. This paper introduces a novel approach that combines instance-based learning and ensemble learning techniques for classifying scientific papers into relevant research fields. Working with a classification system with a group of research fields, first a number of typical seed papers are allocated to each of the fields manually. Then for each paper that needs to be classified, we compare it with all the seed papers in every field. Contents and citations are considered separately. An ensemble-based method is then employed to make the final decision. Experimenting with the datasets from DBLP, our experimental results demonstrate that the proposed classification method is effective and efficient in categorizing papers into various research areas. We also find that both content and citation features are useful for the classification of scientific papers.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Text Classification: A Review, Empirical, and Experimental Evaluation [2.341806147715478]
Existing survey papers categorize algorithms for text classification into broad classes.
We introduce a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques.
Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification.
arXiv Detail & Related papers (2024-01-11T08:17:42Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - Document Provenance and Authentication through Authorship Classification [5.2545206693029884]
We propose an ensemble-based text-processing framework for the classification of single and multi-authored documents.
The proposed framework incorporates several state-of-the-art text classification algorithms.
The framework is evaluated on a large-scale benchmark dataset.
arXiv Detail & Related papers (2023-03-02T12:26:03Z) - The Effect of Metadata on Scientific Literature Tagging: A Cross-Field
Cross-Model Study [29.965010251365946]
We systematically study the effect of metadata on scientific literature tagging across 19 fields.
We observe some ubiquitous patterns of metadata's effects across all fields.
arXiv Detail & Related papers (2023-02-07T09:34:41Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - Topic Segmentation of Research Article Collections [4.0810783261728565]
We perform topic segmentation of a paper data collection that we crawled and produce a multitopic dataset of roughly seven million paper data records.
We construct a taxonomy of topics extracted from the data records and then annotate each document with its corresponding topic from that taxonomy.
It is possible to use this newly proposed dataset in two modalities: as a heterogeneous collection of documents from various disciplines or as a set of homogeneous collections, each from a single research topic.
arXiv Detail & Related papers (2022-05-18T15:19:42Z) - Using Full-text Content of Academic Articles to Build a Methodology
Taxonomy of Information Science in China [10.949304105928286]
This study provides new concepts for constructing a methodology taxonomy of information science.
The proposed methodology taxonomy is more detailed than conventional schemes and the speed of taxonomy renewal has been enhanced.
arXiv Detail & Related papers (2021-01-20T01:56:43Z) - Cooperative Bi-path Metric for Few-shot Learning [50.98891758059389]
We make two contributions to investigate the few-shot classification problem.
We report a simple and effective baseline trained on base classes in the way of traditional supervised learning.
We propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy.
arXiv Detail & Related papers (2020-08-10T11:28:52Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.