Related papers: Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

URL: http://arxiv.org/abs/2406.11608v1
Date: Mon, 17 Jun 2024 14:56:51 GMT
Title: Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Authors: Seulki Park, Youren Zhang, Stella X. Yu, Sara Beery, Jonathan Huang,
Abstract summary: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree. We build upon recent work on learning hierarchical segmentation for flat-level recognition. We introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels.
Score: 37.80849457554078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not accuracy. Our key insight is that hierarchical recognition should not be treated as multi-task classification, as each level is essentially a different task and they would have to compromise with each other, but be grounded on image segmentations that are consistent across semantic granularities. Consistency can in fact improve accuracy. We build upon recent work on learning hierarchical segmentation for flat-level recognition, and extend it to hierarchical recognition. It naturally captures the intuition that fine-grained recognition requires fine image segmentation whereas coarse-grained recognition requires coarse segmentation; they can all be integrated into one recognition model that drives fine-to-coarse internal visual parsing.Additionally, we introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels. Our extensive experimentation and analysis demonstrate our significant gains on predicting an accurate and consistent taxonomy tree.

Related papers

Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression [8.538034422744005]
Ordinal regression bridges regression and classification by assigning objects to ordered classes.<n>Current approaches are limited by the availability of only image-level ordinal labels.<n>We propose a Dual-level Fuzzy Learning with Patch Guidance framework, named DFPG.
arXiv Detail & Related papers (2025-05-09T07:01:14Z)
Harnessing Superclasses for Learning from Hierarchical Databases [1.835004446596942]
In many large-scale classification problems, classes are organized in a known hierarchy, typically represented as a tree. We introduce a loss for this type of supervised hierarchical classification. Our approach does not entail any significant additional computational cost compared with the loss of cross-entropy.
arXiv Detail & Related papers (2024-11-25T14:39:52Z)
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training [29.431698321195814]
Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification. CLIP shows poor performance on multi-label datasets because the global feature tends to be dominated by the most prominent class. We propose a local-to-global framework to obtain image tags.
arXiv Detail & Related papers (2023-12-20T08:15:40Z)
Semantic Guided Level-Category Hybrid Prediction Network for Hierarchical Image Classification [8.456482280676884]
Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure. We propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner.
arXiv Detail & Related papers (2022-11-22T13:49:10Z)
Hierarchical classification at multiple operating points [1.520694326234112]
We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy. We propose two novel loss functions and show that a soft variant of the structured hinge loss is able to significantly outperform the flat baseline.
arXiv Detail & Related papers (2022-10-19T23:36:16Z)
Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels. We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining. We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z)
Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z)
United We Learn Better: Harvesting Learning Improvements From Class Hierarchies Across Tasks [9.687531080021813]
We present a theoretical framework based on probability and set theory for extracting parent predictions and a hierarchical loss. Results show results across classification and detection benchmarks and opening up the possibility of hierarchical learning for sigmoid-based detection architectures.
arXiv Detail & Related papers (2021-07-28T20:25:37Z)
Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition [22.83821575990778]
We re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy. To learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure. Our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft.
arXiv Detail & Related papers (2021-02-19T11:30:25Z)
Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z)
Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification. Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z)
Fine-Grained Visual Classification with Efficient End-to-end Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples. Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space. This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z)
Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious. The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving. The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z)
Hierarchical Entity Typing via Multi-level Learning to Rank [38.509244927293715]
We propose a novel method for hierarchical entity classification that embraces ontological structure at both training and during prediction. At training, our novel multi-level learning-to-rank loss compares positive types against negative siblings according to the type tree. During prediction, we define a coarse-to-fine decoder that restricts viable candidates at each level of the ontology based on already predicted parent type(s)
arXiv Detail & Related papers (2020-04-05T19:27:18Z)
Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier. We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.