Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
- URL: http://arxiv.org/abs/2406.11608v1
- Date: Mon, 17 Jun 2024 14:56:51 GMT
- Title: Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
- Authors: Seulki Park, Youren Zhang, Stella X. Yu, Sara Beery, Jonathan Huang,
- Abstract summary: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree.
We build upon recent work on learning hierarchical segmentation for flat-level recognition.
We introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels.
- Score: 37.80849457554078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not accuracy. Our key insight is that hierarchical recognition should not be treated as multi-task classification, as each level is essentially a different task and they would have to compromise with each other, but be grounded on image segmentations that are consistent across semantic granularities. Consistency can in fact improve accuracy. We build upon recent work on learning hierarchical segmentation for flat-level recognition, and extend it to hierarchical recognition. It naturally captures the intuition that fine-grained recognition requires fine image segmentation whereas coarse-grained recognition requires coarse segmentation; they can all be integrated into one recognition model that drives fine-to-coarse internal visual parsing.Additionally, we introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels. Our extensive experimentation and analysis demonstrate our significant gains on predicting an accurate and consistent taxonomy tree.
Related papers
- Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball [39.76366192826905]
We show that a flat (non-hierarchical) segmentation network, in which the parents are inferred from the children, has superior segmentation accuracy to the hierarchical approach across the board.
We also study a more principled approach to hierarchical segmentation using the Poincar'e ball model.
arXiv Detail & Related papers (2024-04-04T19:50:57Z) - Semantic Guided Level-Category Hybrid Prediction Network for
Hierarchical Image Classification [8.456482280676884]
Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure.
We propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner.
arXiv Detail & Related papers (2022-11-22T13:49:10Z) - Hierarchical classification at multiple operating points [1.520694326234112]
We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy.
We propose two novel loss functions and show that a soft variant of the structured hinge loss is able to significantly outperform the flat baseline.
arXiv Detail & Related papers (2022-10-19T23:36:16Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - United We Learn Better: Harvesting Learning Improvements From Class
Hierarchies Across Tasks [9.687531080021813]
We present a theoretical framework based on probability and set theory for extracting parent predictions and a hierarchical loss.
Results show results across classification and detection benchmarks and opening up the possibility of hierarchical learning for sigmoid-based detection architectures.
arXiv Detail & Related papers (2021-07-28T20:25:37Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious.
The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving.
The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z) - Hierarchical Entity Typing via Multi-level Learning to Rank [38.509244927293715]
We propose a novel method for hierarchical entity classification that embraces ontological structure at both training and during prediction.
At training, our novel multi-level learning-to-rank loss compares positive types against negative siblings according to the type tree.
During prediction, we define a coarse-to-fine decoder that restricts viable candidates at each level of the ontology based on already predicted parent type(s)
arXiv Detail & Related papers (2020-04-05T19:27:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.