Learning Visual Hierarchies with Hyperbolic Embeddings
- URL: http://arxiv.org/abs/2411.17490v1
- Date: Tue, 26 Nov 2024 14:58:06 GMT
- Title: Learning Visual Hierarchies with Hyperbolic Embeddings
- Authors: Ziwei Wang, Sameera Ramasinghe, Chenchen Xu, Julien Monteil, Loris Bazzani, Thalaiyasingam Ajanthan,
- Abstract summary: We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
- Score: 28.35250955426006
- License:
- Abstract: Structuring latent representations in a hierarchical manner enables models to learn patterns at multiple levels of abstraction. However, most prevalent image understanding models focus on visual similarity, and learning visual hierarchies is relatively unexplored. In this work, for the first time, we introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels. As a concrete example, first, we define a part-based image hierarchy using object-level annotations within and across images. Then, we introduce an approach to enforce the hierarchy using contrastive loss with pairwise entailment metrics. Finally, we discuss new evaluation metrics to effectively measure hierarchical image retrieval. Encoding these complex relationships ensures that the learned representations capture semantic and structural information that transcends mere visual similarity. Experiments in part-based image retrieval show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
Related papers
- Emergent Visual-Semantic Hierarchies in Image-Text Representations [13.300199242824934]
We study the knowledge of existing foundation models, finding that they exhibit emergent understanding of visual-semantic hierarchies.
We propose the Radial Embedding (RE) framework for probing and optimizing hierarchical understanding.
arXiv Detail & Related papers (2024-07-11T14:09:42Z) - Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping [33.405667735101595]
We propose a Visual Hierarchy Mapper (Hi-Mapper) for enhancing the structured understanding of the pre-trained Deep Neural Networks (DNNs)
Hi-Mapper investigates the hierarchical organization of the visual scene by 1) pre-defining a hierarchy tree through the encapsulation of probability densities; and 2) learning the hierarchical relations in hyperbolic space with a novel hierarchical contrastive loss.
arXiv Detail & Related papers (2024-04-01T07:45:42Z) - HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding [18.95003393925676]
When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios.
Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships.
We propose a novel framework that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning.
arXiv Detail & Related papers (2023-11-23T15:42:42Z) - Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Building a visual semantics aware object hierarchy [0.0]
We propose a novel unsupervised method to build visual semantics aware object hierarchy.
Our intuition in this paper comes from real-world knowledge representation where concepts are hierarchically organized.
The evaluation consists of two parts, firstly we apply the constructed hierarchy on the object recognition task and then we compare our visual hierarchy and existing lexical hierarchies to show the validity of our method.
arXiv Detail & Related papers (2022-02-26T00:10:21Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.