Fashion Image Retrieval with Multi-Granular Alignment
- URL: http://arxiv.org/abs/2302.08902v1
- Date: Thu, 16 Feb 2023 10:43:31 GMT
- Title: Fashion Image Retrieval with Multi-Granular Alignment
- Authors: Jinkuan Zhu, Hao Huang, Qiao Deng
- Abstract summary: Fashion image retrieval task aims to search relevant clothing items of a query image from the gallery.
Previous recipes focus on designing different distance-based loss functions, pulling relevant pairs to be close and pushing irrelevant images apart.
We propose a novel fashion image retrieval method leveraging both global and fine-grained features, dubbed Multi-Granular Alignment (MGA)
- Score: 4.109124423081812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fashion image retrieval task aims to search relevant clothing items of a
query image from the gallery. The previous recipes focus on designing different
distance-based loss functions, pulling relevant pairs to be close and pushing
irrelevant images apart. However, these methods ignore fine-grained features
(e.g. neckband, cuff) of clothing images. In this paper, we propose a novel
fashion image retrieval method leveraging both global and fine-grained
features, dubbed Multi-Granular Alignment (MGA). Specifically, we design a
Fine-Granular Aggregator(FGA) to capture and aggregate detailed patterns. Then
we propose Attention-based Token Alignment (ATA) to align image features at the
multi-granular level in a coarse-to-fine manner. To prove the effectiveness of
our proposed method, we conduct experiments on two sub-tasks (In-Shop &
Consumer2Shop) of the public fashion datasets DeepFashion. The experimental
results show that our MGA outperforms the state-of-the-art methods by 3.1% and
0.6% in the two sub-tasks on the R@1 metric, respectively.
Related papers
- MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation [70.83668869857665]
MMTryon is a multi-modal multi-reference VIrtual Try-ON framework.
It can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs.
arXiv Detail & Related papers (2024-05-01T11:04:22Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - TriPINet: Tripartite Progressive Integration Network for Image
Manipulation Localization [3.7359400978194675]
We propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization.
We develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues.
Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches.
arXiv Detail & Related papers (2022-12-25T02:27:58Z) - Single Stage Virtual Try-on via Deformable Attention Flows [51.70606454288168]
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation.
Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-07-19T10:01:31Z) - Exploring the Interactive Guidance for Unified and Effective Image
Matting [16.933897631478146]
We propose a Unified Interactive image Matting method, named UIM, which solves the limitations and achieves satisfying matting results.
Specifically, UIM leverages multiple types of user interaction to avoid the ambiguity of multiple matting targets.
We show that UIM achieves state-of-the-art performance on the Composition-1K test set and a synthetic unified dataset.
arXiv Detail & Related papers (2022-05-17T13:20:30Z) - Mask guided attention for fine-grained patchy image classification [22.91753200323264]
mask guided attention (MGA) method for fine-grained patchy image classification is presented.
We verify the effectiveness of our method on three publicly available patchy image datasets.
Our ablation study shows that MGA improves the accuracy by 2.25% and 2% on the SoyCultivarVein and BtfPIS datasets.
arXiv Detail & Related papers (2021-02-04T17:54:50Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Where to Look and How to Describe: Fashion Image Retrieval with an
Attentional Heterogeneous Bilinear Network [50.19558726384559]
We propose a biologically inspired framework for image-based fashion product retrieval.
Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.
arXiv Detail & Related papers (2020-10-26T06:01:09Z) - Devil's in the Details: Aligning Visual Clues for Conditional Embedding
in Person Re-Identification [94.77172127405846]
We propose two key recognition patterns to better utilize the detail information of pedestrian images.
CACE-Net achieves state-of-the-art performance on three public datasets.
arXiv Detail & Related papers (2020-09-11T06:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.