FashionSearchNet-v2: Learning Attribute Representations with
Localization for Image Retrieval with Attribute Manipulation
- URL: http://arxiv.org/abs/2111.14145v1
- Date: Sun, 28 Nov 2021 13:50:20 GMT
- Title: FashionSearchNet-v2: Learning Attribute Representations with
Localization for Image Retrieval with Attribute Manipulation
- Authors: Kenan E. Ak, Joo Hwee Lim, Ying Sun, Jo Yew Tham, Ashraf A. Kassim
- Abstract summary: The proposed FashionSearchNet-v2 architecture is able to learn attribute specific representations by leveraging on its weakly-supervised localization module.
The network is jointly trained with the combination of attribute classification and triplet ranking loss to estimate local representations.
Experiments performed on several datasets that are rich in terms of the number of attributes show that FashionSearchNet-v2 outperforms the other state-of-the-art attribute manipulation techniques.
- Score: 22.691709684780292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The focus of this paper is on the problem of image retrieval with attribute
manipulation. Our proposed work is able to manipulate the desired attributes of
the query image while maintaining its other attributes. For example, the collar
attribute of the query image can be changed from round to v-neck to retrieve
similar images from a large dataset. A key challenge in e-commerce is that
images have multiple attributes where users would like to manipulate and it is
important to estimate discriminative feature representations for each of these
attributes. The proposed FashionSearchNet-v2 architecture is able to learn
attribute specific representations by leveraging on its weakly-supervised
localization module, which ignores the unrelated features of attributes in the
feature space, thus improving the similarity learning. The network is jointly
trained with the combination of attribute classification and triplet ranking
loss to estimate local representations. These local representations are then
merged into a single global representation based on the instructed attribute
manipulation where desired images can be retrieved with a distance metric. The
proposed method also provides explainability for its retrieval process to help
provide additional information on the attention of the network. Experiments
performed on several datasets that are rich in terms of the number of
attributes show that FashionSearchNet-v2 outperforms the other state-of-the-art
attribute manipulation techniques. Different than our earlier work
(FashionSearchNet), we propose several improvements in the learning procedure
and show that the proposed FashionSearchNet-v2 can be generalized to different
domains other than fashion.
Related papers
- ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes.
ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation.
This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z) - Learning Concise and Descriptive Attributes for Visual Recognition [25.142065847381758]
We show that querying thousands of attributes can achieve performance competitive with image features.
We propose a novel learning-to-search method to discover those concise sets of attributes.
arXiv Detail & Related papers (2023-08-07T16:00:22Z) - Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval [27.751399400911932]
We introduce an attribute-guided multi-level attention network (AG-MAN) for fine-grained fashion retrieval.
Specifically, we first enhance the pre-trained feature extractor to capture multi-level image embedding.
Then, we propose a classification scheme where images with the same attribute, albeit with different values, are categorized into the same class.
arXiv Detail & Related papers (2022-12-27T05:28:38Z) - Supervised Attribute Information Removal and Reconstruction for Image
Manipulation [15.559224431459551]
We propose an Attribute Information Removal and Reconstruction (AIRR) network that prevents such information hiding.
We evaluate our approach on four diverse datasets with a variety of attributes including DeepFashion Synthesis, DeepFashion Fine-grained Attribute, CelebA and CelebA-HQ.
arXiv Detail & Related papers (2022-07-13T23:30:44Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Scalable Visual Attribute Extraction through Hidden Layers of a Residual
ConvNet [7.6702700993064115]
We propose an approach for extracting visual attributes from images, leveraging the learned capability of the hidden layers of a general convolutional network.
We run experiments with a resnet-50 trained on Imagenet, on which we evaluate the output of its different blocks to discriminate between colors and textures.
arXiv Detail & Related papers (2021-03-31T23:39:20Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z) - Joint Item Recommendation and Attribute Inference: An Adaptive Graph
Convolutional Network Approach [61.2786065744784]
In recommender systems, users and items are associated with attributes, and users show preferences to items.
As annotating user (item) attributes is a labor intensive task, the attribute values are often incomplete with many missing attribute values.
We propose an Adaptive Graph Convolutional Network (AGCN) approach for joint item recommendation and attribute inference.
arXiv Detail & Related papers (2020-05-25T10:50:01Z) - Fashionpedia: Ontology, Segmentation, and an Attribute Localization
Dataset [62.77342894987297]
We propose a novel Attribute-Mask RCNN model to jointly perform instance segmentation and localized attribute recognition.
We also demonstrate instance segmentation models pre-trained on Fashionpedia achieve better transfer learning performance on other fashion datasets than ImageNet pre-training.
arXiv Detail & Related papers (2020-04-26T02:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.