Blurb-Refined Inference from Crowdsourced Book Reviews using Hierarchical Genre Mining with Dual-Path Graph Convolutions
- URL: http://arxiv.org/abs/2512.21076v1
- Date: Wed, 24 Dec 2025 09:49:56 GMT
- Title: Blurb-Refined Inference from Crowdsourced Book Reviews using Hierarchical Genre Mining with Dual-Path Graph Convolutions
- Authors: Suraj Kumar, Utsav Kumar Nareti, Soumi Chattopadhyay, Chandranath Adak, Prolay Mallick,
- Abstract summary: HiGeMine is a hierarchical genre mining framework that robustly integrates user reviews with authoritative book blurbs.<n>In the first phase, HiGeMine employs a zero-shot semantic alignment strategy to filter reviews, retaining only those semantically consistent with the corresponding blurb.<n>In the second phase, we introduce a dual-path, two-level graph-based classification architecture.
- Score: 0.09851812512860349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate book genre classification is fundamental to digital library organization, content discovery, and personalized recommendation. Existing approaches typically model genre prediction as a flat, single-label task, ignoring hierarchical genre structure and relying heavily on noisy, subjective user reviews, which often degrade classification reliability. We propose HiGeMine, a two-phase hierarchical genre mining framework that robustly integrates user reviews with authoritative book blurbs. In the first phase, HiGeMine employs a zero-shot semantic alignment strategy to filter reviews, retaining only those semantically consistent with the corresponding blurb, thereby mitigating noise, bias, and irrelevance. In the second phase, we introduce a dual-path, two-level graph-based classification architecture: a coarse-grained Level-1 binary classifier distinguishes fiction from non-fiction, followed by Level-2 multi-label classifiers for fine-grained genre prediction. Inter-genre dependencies are explicitly modeled using a label co-occurrence graph, while contextual representations are derived from pretrained language models applied to the filtered textual content. To facilitate systematic evaluation, we curate a new hierarchical book genre dataset. Extensive experiments demonstrate that HiGeMine consistently outperformed strong baselines across hierarchical genre classification tasks. The proposed framework offers a principled and effective solution for leveraging both structured and unstructured textual data in hierarchical book genre analysis.
Related papers
- ProtoSiTex: Learning Semi-Interpretable Prototypes for Multi-label Text Classification [0.7534418099163723]
ProtoSiTex is a semi-interpretable framework designed for fine-grained multi-label text classification.<n>It captures overlapping and conflicting semantics using adaptive prototypes and multi-head attention.<n>ProtoSiTex achieves state-of-the-art performance while delivering faithful, human-aligned explanations.
arXiv Detail & Related papers (2025-10-14T13:59:28Z) - Bidirectional Logits Tree: Pursuing Granularity Reconcilement in Fine-Grained Classification [89.20477310885731]
This paper addresses the challenge of Granularity Competition in fine-grained classification tasks.<n>Existing approaches typically develop independent hierarchy-aware models based on shared features extracted from a common base encoder.<n>We propose a novel framework called the Bidirectional Logits Tree (BiLT) for Granularity Reconcilement.
arXiv Detail & Related papers (2024-12-17T10:42:19Z) - Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework.<n>Label templates are embedded into input sentences to fully utilize the potential value of class labels.<n> supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z) - Hierarchical Query Classification in E-commerce Search [38.67034103433015]
E-commerce platforms typically store and structure product information and search data in a hierarchy.
Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research.
The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
arXiv Detail & Related papers (2024-03-09T21:55:55Z) - Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics [3.89394670917253]
We describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata.
We aim to create a more uniform set of labels to serve as a "bridge" in the combined dataset.
Visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata.
arXiv Detail & Related papers (2022-08-20T10:59:33Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - Semi-Supervised Domain Generalization with Stochastic StyleMatch [90.98288822165482]
In real-world applications, we might have only a few labels available from each source domain due to high annotation cost.
In this work, we investigate semi-supervised domain generalization, a more realistic and practical setting.
Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling.
arXiv Detail & Related papers (2021-06-01T16:00:08Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z) - A Unified Dual-view Model for Review Summarization and Sentiment
Classification with Inconsistency Loss [51.448615489097236]
Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms.
We propose a novel dual-view model that jointly improves the performance of these two tasks.
Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2020-06-02T13:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.