Learning Consistent Taxonomic Classification through Hierarchical Reasoning
- URL: http://arxiv.org/abs/2601.14610v1
- Date: Wed, 21 Jan 2026 03:00:00 GMT
- Title: Learning Consistent Taxonomic Classification through Hierarchical Reasoning
- Authors: Zhenghong Li, Kecheng Zheng, Haibin Ling,
- Abstract summary: We propose a two-stage, hierarchy-based reasoning framework designed to improve leaf-level accuracy and hierarchical consistency in taxonomic classification.<n>Our framework, implemented on the Qwen2.5-VL-7B model, outperforms its original 72B counterpart by over 10% in both leaf-level and hierarchical consistency accuracy.
- Score: 61.372270953201955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Vision-Language Models (VLMs) excel at visual understanding, they often fail to grasp hierarchical knowledge. This leads to common errors where VLMs misclassify coarser taxonomic levels even when correctly identifying the most specific level (leaf level). Existing approaches largely overlook this issue by failing to model hierarchical reasoning. To address this gap, we propose VL-Taxon, a two-stage, hierarchy-based reasoning framework designed to improve both leaf-level accuracy and hierarchical consistency in taxonomic classification. The first stage employs a top-down process to enhance leaf-level classification accuracy. The second stage then leverages this accurate leaf-level output to ensure consistency throughout the entire taxonomic hierarchy. Each stage is initially trained with supervised fine-tuning to instill taxonomy knowledge, followed by reinforcement learning to refine the model's reasoning and generalization capabilities. Extensive experiments reveal a remarkable result: our VL-Taxon framework, implemented on the Qwen2.5-VL-7B model, outperforms its original 72B counterpart by over 10% in both leaf-level and hierarchical consistency accuracy on average on the iNaturalist-2021 dataset. Notably, this significant gain was achieved by fine-tuning on just a small subset of data, without relying on any examples generated by other VLMs.
Related papers
- Hierarchy-Aware Fine-Tuning of Vision-Language Models [18.244518940229202]
Vision-Language Models learn powerful multimodal representations through large-scale image-text pretraining.<n>Standard approaches treat labels as flat categories and require full fine-tuning, which is expensive and produces inconsistent predictions.<n>We propose an efficient hierarchy-aware fine-tuning framework that updates a few parameters while enforcing structural consistency.
arXiv Detail & Related papers (2025-12-25T06:44:33Z) - Feature Identification for Hierarchical Contrastive Learning [7.655211354400059]
We propose two novel hierarchical contrastive learning (HMLC) methods.<n>Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels.<n>Our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy.
arXiv Detail & Related papers (2025-10-01T12:46:47Z) - Enforcing Consistency and Fairness in Multi-level Hierarchical Classification with a Mask-based Output Layer [25.819440955594736]
We introduce a fair, model-agnostic layer designed to enforce taxonomy and optimize objectives, including consistency, fairness, and exact match.<n>Our evaluations demonstrate that the proposed layer not only improves the fairness of predictions but also enforces the taxonomy, resulting in consistent predictions and superior performance.
arXiv Detail & Related papers (2025-03-19T06:30:04Z) - Visually Consistent Hierarchical Image Classification [37.80849457554078]
Hierarchical classification predicts labels across multiple levels of a taxonomy, e.g., from coarse-level 'Bird' to mid-level 'Hummingbird' to fine-level 'Green hermit'
arXiv Detail & Related papers (2024-06-17T14:56:51Z) - CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection [42.33618249731874]
We show that minimizing the magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss.
We have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks.
arXiv Detail & Related papers (2024-05-26T03:28:59Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - ProTeCt: Prompt Tuning for Taxonomic Open Set Classification [59.59442518849203]
Few-shot adaptation methods do not fare well in the taxonomic open set (TOS) setting.
We propose a prompt tuning technique that calibrates the hierarchical consistency of model predictions.
A new Prompt Tuning for Hierarchical Consistency (ProTeCt) technique is then proposed to calibrate classification across label set granularities.
arXiv Detail & Related papers (2023-06-04T02:55:25Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Hierarchical classification at multiple operating points [1.520694326234112]
We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy.
We propose two novel loss functions and show that a soft variant of the structured hinge loss is able to significantly outperform the flat baseline.
arXiv Detail & Related papers (2022-10-19T23:36:16Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Making CNNs Interpretable by Building Dynamic Sequential Decision
Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable.
We achieve this by building a differentiable decision forest on top of CNNs.
We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.