Are "Hierarchical" Visual Representations Hierarchical?
- URL: http://arxiv.org/abs/2311.05784v2
- Date: Thu, 23 Nov 2023 20:45:53 GMT
- Title: Are "Hierarchical" Visual Representations Hierarchical?
- Authors: Ethan Shen, Ali Farhadi, Aditya Kusupati
- Abstract summary: "hierarchical" visual representations aim at modeling the underlying hierarchy of the visual world.
HierNet is a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet.
- Score: 42.50633217896189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learned visual representations often capture large amounts of semantic
information for accurate downstream applications. Human understanding of the
world is fundamentally grounded in hierarchy. To mimic this and further improve
representation capabilities, the community has explored "hierarchical" visual
representations that aim at modeling the underlying hierarchy of the visual
world. In this work, we set out to investigate if hierarchical visual
representations truly capture the human perceived hierarchy better than
standard learned representations. To this end, we create HierNet, a suite of 12
datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet.
After extensive evaluation of Hyperbolic and Matryoshka Representations across
training setups, we conclude that they do not capture hierarchy any better than
the standard representations but can assist in other aspects like search
efficiency and interpretability. Our benchmark and the datasets are
open-sourced at https://github.com/ethanlshen/HierNet.
Related papers
- Learning Structured Representations with Hyperbolic Embeddings [22.95613852886361]
We propose HypStructure: a Hyperbolic Structured regularization approach to accurately embed the label hierarchy into the learned representations.
Experiments on several large-scale vision benchmarks demonstrate the efficacy of HypStructure in reducing distortion.
For a better understanding of structured representation, we perform eigenvalue analysis that links the representation geometry to improved Out-of-Distribution (OOD) detection performance.
arXiv Detail & Related papers (2024-12-02T00:56:44Z) - Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z) - Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping [33.405667735101595]
We propose a Visual Hierarchy Mapper (Hi-Mapper) for enhancing the structured understanding of the pre-trained Deep Neural Networks (DNNs)
Hi-Mapper investigates the hierarchical organization of the visual scene by 1) pre-defining a hierarchy tree through the encapsulation of probability densities; and 2) learning the hierarchical relations in hyperbolic space with a novel hierarchical contrastive loss.
arXiv Detail & Related papers (2024-04-01T07:45:42Z) - HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding [18.95003393925676]
When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios.
Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships.
We propose a novel framework that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning.
arXiv Detail & Related papers (2023-11-23T15:42:42Z) - OPERA: Omni-Supervised Representation Learning with Hierarchical
Supervisions [94.31804364707575]
We propose Omni-suPErvised Representation leArning with hierarchical supervisions (OPERA) as a solution.
We extract a set of hierarchical proxy representations for each image and impose self and full supervisions on the corresponding proxy representations.
Experiments on both convolutional neural networks and vision transformers demonstrate the superiority of OPERA in image classification, segmentation, and object detection.
arXiv Detail & Related papers (2022-10-11T15:51:31Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic
Cones [64.75766944882389]
We present ConE (Cone Embedding), a KG embedding model that is able to simultaneously model multiple hierarchical as well as non-hierarchical relations in a knowledge graph.
In particular, ConE uses cone containment constraints in different subspaces of the hyperbolic embedding space to capture multiple heterogeneous hierarchies.
Our approach yields new state-of-the-art Hits@1 of 45.3% on WN18RR and 16.1% on DDB14 (0.231 MRR)
arXiv Detail & Related papers (2021-10-28T07:16:08Z) - SHERLock: Self-Supervised Hierarchical Event Representation Learning [22.19386609894017]
We propose a model that learns temporal representations from long-horizon visual demonstration data.
Our method produces a hierarchy of representations that align more closely with ground-truth human-annotated events.
arXiv Detail & Related papers (2020-10-06T09:04:01Z) - Global-Local Bidirectional Reasoning for Unsupervised Representation
Learning of 3D Point Clouds [109.0016923028653]
We learn point cloud representation by bidirectional reasoning between the local structures and the global shape without human supervision.
We show that our unsupervised model surpasses the state-of-the-art supervised methods on both synthetic and real-world 3D object classification datasets.
arXiv Detail & Related papers (2020-03-29T08:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.