Hierarchical thematic classification of major conference proceedings
- URL: http://arxiv.org/abs/2406.14983v1
- Date: Fri, 21 Jun 2024 08:48:57 GMT
- Title: Hierarchical thematic classification of major conference proceedings
- Authors: Arsentii Kuzmin, Alexander Aduenko, Vadim Strijov,
- Abstract summary: We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree.
The system sorts the topics by relevance to a given document.
We propose a weighted hierarchical similarity function to calculate topic relevance.
- Score: 44.99833362998488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we develop a decision support system for the hierarchical text classification. We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree. The system sorts the topics by relevance to a given document. The experts choose one of the most relevant topics to finish the classification. We propose a weighted hierarchical similarity function to calculate topic relevance. The function calculates the similarity of a document and a tree branch. The weights in this function determine word importance. We use the entropy of words to estimate the weights. The proposed hierarchical similarity function formulates a joint hierarchical thematic classification probability model of the document topics, parameters, and hyperparameters. The variational Bayesian inference gives a closed-form EM algorithm. The EM algorithm estimates the parameters and calculates the probability of a topic for a given document. Compared to hierarchical multiclass SVM, hierarchical PLSA with adaptive regularization, and hierarchical naive Bayes, the weighted hierarchical similarity function has better improvement in ranking accuracy in an abstract collection of a major conference EURO and a website collection of industrial companies.
Related papers
- ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.
Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.
Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Hierarchical Multi-Label Classification of Scientific Documents [47.293189105900524]
We introduce a new dataset for hierarchical multi-label text classification of scientific papers called SciHTC.
This dataset contains 186,160 papers and 1,233 categories from the ACM CCS tree.
Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities.
arXiv Detail & Related papers (2022-11-05T04:12:57Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Conical Classification For Computationally Efficient One-Class Topic
Determination [0.0]
We propose a Conical classification approach to identify documents that relate to a particular topic.
We show in our analysis that our approach has higher predictive power on our datasets, and is also faster to compute.
arXiv Detail & Related papers (2021-10-31T01:27:12Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - TagRec: Automated Tagging of Questions with Hierarchical Learning
Taxonomy [0.0]
Online educational platforms organize academic questions based on a hierarchical learning taxonomy (subject-chapter-topic)
This paper formulates the problem as a similarity-based retrieval task where we optimize the semantic relatedness between the taxonomy and the questions.
We demonstrate that our method helps to handle the unseen labels and hence can be used for taxonomy tagging in the wild.
arXiv Detail & Related papers (2021-07-03T11:50:55Z) - Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network [49.458250193768826]
We propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents.
Both the words and topics are represented as embedding vectors of the same dimension.
Our models outperform other neural topic models on extracting deeper interpretable topics.
arXiv Detail & Related papers (2021-06-30T10:14:57Z) - Pitfalls of Assessing Extracted Hierarchies for Multi-Class
Classification [4.89253144446913]
We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods.
We show how the hierarchy's quality can become irrelevant depending on the experimental setup.
Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other.
arXiv Detail & Related papers (2021-01-26T21:50:57Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Deep Hierarchical Classification for Category Prediction in E-commerce
System [16.6932395109085]
In e-commerce system, category prediction is to automatically predict categories of given texts.
We propose a Deep Hierarchical Classification framework, which incorporates the multi-scale hierarchical information in neural networks.
We also define a novel combined loss function to punish hierarchical prediction losses.
arXiv Detail & Related papers (2020-05-14T02:29:14Z) - Efficient strategies for hierarchical text classification: External
knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy.
With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.