Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
- URL: http://arxiv.org/abs/2404.00974v1
- Date: Mon, 1 Apr 2024 07:45:42 GMT
- Title: Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
- Authors: Hyeongjun Kwon, Jinhyun Jang, Jin Kim, Kwonyoung Kim, Kwanghoon Sohn,
- Abstract summary: We propose a Visual Hierarchy Mapper (Hi-Mapper) for enhancing the structured understanding of the pre-trained Deep Neural Networks (DNNs)
Hi-Mapper investigates the hierarchical organization of the visual scene by 1) pre-defining a hierarchy tree through the encapsulation of probability densities; and 2) learning the hierarchical relations in hyperbolic space with a novel hierarchical contrastive loss.
- Score: 33.405667735101595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details. Exploring such a visual hierarchy is crucial to recognize the complex relations of visual elements, leading to a comprehensive scene understanding. In this paper, we propose a Visual Hierarchy Mapper (Hi-Mapper), a novel approach for enhancing the structured understanding of the pre-trained Deep Neural Networks (DNNs). Hi-Mapper investigates the hierarchical organization of the visual scene by 1) pre-defining a hierarchy tree through the encapsulation of probability densities; and 2) learning the hierarchical relations in hyperbolic space with a novel hierarchical contrastive loss. The pre-defined hierarchy tree recursively interacts with the visual features of the pre-trained DNNs through hierarchy decomposition and encoding procedures, thereby effectively identifying the visual hierarchy and enhancing the recognition of an entire scene. Extensive experiments demonstrate that Hi-Mapper significantly enhances the representation capability of DNNs, leading to an improved performance on various tasks, including image classification and dense prediction tasks.
Related papers
- Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z) - Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding [18.95003393925676]
When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios.
Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships.
We propose a novel framework that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning.
arXiv Detail & Related papers (2023-11-23T15:42:42Z) - Are "Hierarchical" Visual Representations Hierarchical? [42.50633217896189]
"hierarchical" visual representations aim at modeling the underlying hierarchy of the visual world.
HierNet is a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet.
arXiv Detail & Related papers (2023-11-09T23:25:29Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - ExpNet: A unified network for Expert-Level Classification [40.109357254623085]
We propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network.
In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift.
We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification.
arXiv Detail & Related papers (2022-11-29T12:20:25Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.