Related papers: hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8

hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8

URL: http://arxiv.org/abs/2510.23278v1
Date: Mon, 27 Oct 2025 12:39:50 GMT
Title: hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8
Authors: Veska Tsenkova, Peter Stanchev, Daniel Petrov, Deyan Lazarov,
Abstract summary: This paper proposes an end-to-end hierarchical model for image detection and classification built upon the YOLO model family.<n>A novel hierarchical architecture, a modified loss function, and a performance metric tailored to the hierarchical nature of the model are introduced.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current convolution neural network (CNN) classification methods are predominantly focused on flat classification which aims solely to identify a specified object within an image. However, real-world objects often possess a natural hierarchical organization that can significantly help classification tasks. Capturing the presence of relations between objects enables better contextual understanding as well as control over the severity of mistakes. Considering these aspects, this paper proposes an end-to-end hierarchical model for image detection and classification built upon the YOLO model family. A novel hierarchical architecture, a modified loss function, and a performance metric tailored to the hierarchical nature of the model are introduced. The proposed model is trained and evaluated on two different hierarchical categorizations of the same dataset: a systematic categorization that disregards visual similarities between objects and a categorization accounting for common visual characteristics across classes. The results illustrate how the suggested methodology addresses the inherent hierarchical structure present in real-world objects, which conventional flat classification algorithms often overlook.

Related papers

A class-driven hierarchical ResNet for classification of multispectral remote sensing images [12.282079123411947]
We present a class-driven hierarchical Residual Neural Network (ResNet) for modelling the classification of Time Series (TS) of multispectral images at different semantical class levels.<n>We leverage on hierarchy-penalty maps to discourage incoherent hierarchical transitions within the classification.<n>The experimental results, obtained on two tiles of the Amazonian Forest on 12 monthly composites of Sentinel 2 images, demonstrate the effectiveness of the hierarchical approach.
arXiv Detail & Related papers (2025-10-09T10:47:52Z)
A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images [12.282079123411947]
We present a novel Semantics-Aware Hierarchical Consensus (SAHC) method for learning hierarchical features and relationships.<n>The SAHC method is evaluated on three benchmark datasets with different degrees of hierarchical complexity.<n> Experimental results show both the effectiveness of the proposed approach in guiding network learning and the robustness of the hierarchical consensus for remote sensing image classification tasks.
arXiv Detail & Related papers (2025-10-06T15:30:39Z)
Feature Identification for Hierarchical Contrastive Learning [7.655211354400059]
We propose two novel hierarchical contrastive learning (HMLC) methods.<n>Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels.<n>Our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy.
arXiv Detail & Related papers (2025-10-01T12:46:47Z)
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images. We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups. Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs. We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset. Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z)
Visual Ground Truth Construction as Faceted Classification [4.7590051176368915]
Key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments.
arXiv Detail & Related papers (2022-02-17T08:35:23Z)
Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs. We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
Learn Class Hierarchy using Convolutional Neural Networks [0.9569316316728905]
We propose a new architecture for hierarchical classification of images, introducing a stack of deep linear layers with cross-entropy loss functions and center loss combined. We experimentally show that our hierarchical classifier presents advantages to the traditional classification approaches finding application in computer vision tasks.
arXiv Detail & Related papers (2020-05-18T12:06:43Z)
Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier. We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.