Adaptive Semantic-Visual Tree for Hierarchical Embeddings
- URL: http://arxiv.org/abs/2003.03707v1
- Date: Sun, 8 Mar 2020 03:36:42 GMT
- Title: Adaptive Semantic-Visual Tree for Hierarchical Embeddings
- Authors: Shuo Yang, Wei Yu, Ying Zheng, Hongxun Yao, Tao Mei
- Abstract summary: We propose a hierarchical adaptive semantic-visual tree to depict the architecture of merchandise categories.
The tree evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously.
At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding.
- Score: 67.01307058209709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Merchandise categories inherently form a semantic hierarchy with different
levels of concept abstraction, especially for fine-grained categories. This
hierarchy encodes rich correlations among various categories across different
levels, which can effectively regularize the semantic space and thus make
predictions less ambiguous. However, previous studies of fine-grained image
retrieval primarily focus on semantic similarities or visual similarities. In a
real application, merely using visual similarity may not satisfy the need of
consumers to search merchandise with real-life images, e.g., given a red coat
as a query image, we might get a red suit in recall results only based on
visual similarity since they are visually similar. But the users actually want
a coat rather than suit even the coat is with different color or texture
attributes. We introduce this new problem based on photoshopping in real
practice. That's why semantic information are integrated to regularize the
margins to make "semantic" prior to "visual". To solve this new problem, we
propose a hierarchical adaptive semantic-visual tree (ASVT) to depict the
architecture of merchandise categories, which evaluates semantic similarities
between different semantic levels and visual similarities within the same
semantic class simultaneously. The semantic information satisfies the demand of
consumers for similar merchandise with the query while the visual information
optimizes the correlations within the semantic class. At each level, we set
different margins based on the semantic hierarchy and incorporate them as prior
information to learn a fine-grained feature embedding. To evaluate our
framework, we propose a new dataset named JDProduct, with hierarchical labels
collected from actual image queries and official merchandise images on an
online shopping application. Extensive experimental results on the public
CARS196 and CUB-
Related papers
- Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z) - Addressing Discrepancies in Semantic and Visual Alignment in Neural
Networks [0.0]
We consider the problem of when semantically similar classes are visually dissimilar, and when visual similarity is present among non-similar classes.
We propose a data augmentation technique with the goal of better aligning semantically similar classes with arbitrary (non-visual) semantic relationships.
Results demonstrate that there is an increase in alignment of semantically similar classes when using our proposed data augmentation method.
arXiv Detail & Related papers (2023-06-01T21:03:06Z) - Vocabulary-free Image Classification [75.38039557783414]
We formalize a novel task, termed as Vocabulary-free Image Classification (VIC)
VIC aims to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
CaSED is a method that exploits a pre-trained vision-language model and an external vision-language database to address VIC in a training-free manner.
arXiv Detail & Related papers (2023-06-01T17:19:43Z) - Comprehending and Ordering Semantics for Image Captioning [124.48670699658649]
We propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net)
COS-Net unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
arXiv Detail & Related papers (2022-06-14T15:51:14Z) - HIRL: A General Framework for Hierarchical Image Representation Learning [54.12773508883117]
We propose a general framework for Hierarchical Image Representation Learning (HIRL)
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme.
arXiv Detail & Related papers (2022-05-26T05:13:26Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z) - Learning Representations For Images With Hierarchical Labels [1.3579420996461438]
We present a set of methods to leverage information about the semantic hierarchy induced by class labels.
We show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
Although, both the CNN-classifiers injected with hierarchical information, and the embedding-based models outperform a hierarchy-agnostic model on the newly presented, real-world ETH Entomological Collection image dataset.
arXiv Detail & Related papers (2020-04-02T09:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.