From Region to Patch: Attribute-Aware Foreground-Background Contrastive
Learning for Fine-Grained Fashion Retrieval
- URL: http://arxiv.org/abs/2305.10260v1
- Date: Wed, 17 May 2023 14:49:20 GMT
- Title: From Region to Patch: Attribute-Aware Foreground-Background Contrastive
Learning for Fine-Grained Fashion Retrieval
- Authors: Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang,
Jixiang Zhu, Baolong Liu
- Abstract summary: Attribute-specific fashion retrieval (ASFR) is a challenging information retrieval task.
We propose a Region-to-Patch Framework (RPF) to extract fine-grained attribute-related visual features.
Our framework strikes a proper balance between region localization and feature extraction.
- Score: 27.931767073714635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribute-specific fashion retrieval (ASFR) is a challenging information
retrieval task, which has attracted increasing attention in recent years.
Different from traditional fashion retrieval which mainly focuses on optimizing
holistic similarity, the ASFR task concentrates on attribute-specific
similarity, resulting in more fine-grained and interpretable retrieval results.
As the attribute-specific similarity typically corresponds to the specific
subtle regions of images, we propose a Region-to-Patch Framework (RPF) that
consists of a region-aware branch and a patch-aware branch to extract
fine-grained attribute-related visual features for precise retrieval in a
coarse-to-fine manner. In particular, the region-aware branch is first to be
utilized to locate the potential regions related to the semantic of the given
attribute. Then, considering that the located region is coarse and still
contains the background visual contents, the patch-aware branch is proposed to
capture patch-wise attribute-related details from the previous amplified
region. Such a hybrid architecture strikes a proper balance between region
localization and feature extraction. Besides, different from previous works
that solely focus on discriminating the attribute-relevant foreground visual
features, we argue that the attribute-irrelevant background features are also
crucial for distinguishing the detailed visual contexts in a contrastive
manner. Therefore, a novel E-InfoNCE loss based on the foreground and
background representations is further proposed to improve the discrimination of
attribute-specific representation. Extensive experiments on three datasets
demonstrate the effectiveness of our proposed framework, and also show a decent
generalization of our RPF on out-of-domain fashion images. Our source code is
available at https://github.com/HuiGuanLab/RPF.
Related papers
- Selective Domain-Invariant Feature for Generalizable Deepfake Detection [21.671221284842847]
We propose a novel framework which reduces the sensitivity to face forgery by fusing content features and styles.
Both qualitative and quantitative results in existing benchmarks and proposals demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-19T13:09:19Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval [27.751399400911932]
We introduce an attribute-guided multi-level attention network (AG-MAN) for fine-grained fashion retrieval.
Specifically, we first enhance the pre-trained feature extractor to capture multi-level image embedding.
Then, we propose a classification scheme where images with the same attribute, albeit with different values, are categorized into the same class.
arXiv Detail & Related papers (2022-12-27T05:28:38Z) - TransFA: Transformer-based Representation for Face Attribute Evaluation [87.09529826340304]
We propose a novel textbftransformer-based representation for textbfattribute evaluation method (textbfTransFA)
The proposed TransFA achieves superior performances compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-07-12T10:58:06Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Pedestrian Attribute Recognition in Video Surveillance Scenarios Based
on View-attribute Attention Localization [8.807717261983539]
We propose a novel view-attribute localization method based on attention (VALA)
A specific view-attribute is composed by the extracted attribute feature and four view scores which are predicted by view predictor as the confidences for attribute from different views.
Experiments on three wide datasets (RAP, RAPv2, PETA, and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-11T16:09:31Z) - Fine-Grained Fashion Similarity Prediction by Attribute-Specific
Embedding Learning [71.74073012364326]
We propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings.
The proposed ASEN is comprised of a global branch and a local branch.
Experiments on three fashion-related datasets, i.e., FashionAI, DARN, and DeepFashion, show the effectiveness of ASEN for fine-grained fashion similarity prediction.
arXiv Detail & Related papers (2021-04-06T11:26:38Z) - DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image
Segmentation on Unseen Datasets [96.92018649136217]
We present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains.
Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains.
Our framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.
arXiv Detail & Related papers (2020-10-13T07:28:39Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.