UIGR: Unified Interactive Garment Retrieval
- URL: http://arxiv.org/abs/2204.03111v1
- Date: Wed, 6 Apr 2022 21:54:14 GMT
- Title: UIGR: Unified Interactive Garment Retrieval
- Authors: Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, Tao Xiang
- Abstract summary: Interactive garment retrieval (IGR) aims to retrieve a target garment image based on a reference garment image.
Two IGR tasks have been studied extensively: text-guided garment retrieval (TGR) and visually compatible garment retrieval (VCR)
We propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR.
- Score: 105.56179829647142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive garment retrieval (IGR) aims to retrieve a target garment image
based on a reference garment image along with user feedback on what to change
on the reference garment. Two IGR tasks have been studied extensively:
text-guided garment retrieval (TGR) and visually compatible garment retrieval
(VCR). The user feedback for the former indicates what semantic attributes to
change with the garment category preserved, while the category is the only
thing to be changed explicitly for the latter, with an implicit requirement on
style preservation. Despite the similarity between these two tasks and the
practical need for an efficient system tackling both, they have never been
unified and modeled jointly. In this paper, we propose a Unified Interactive
Garment Retrieval (UIGR) framework to unify TGR and VCR. To this end, we first
contribute a large-scale benchmark suited for both problems. We further propose
a strong baseline architecture to integrate TGR and VCR in one model. Extensive
experiments suggest that unifying two tasks in one framework is not only more
efficient by requiring a single model only, it also leads to better
performance. Code and datasets are available at
https://github.com/BrandonHanx/CompFashion.
Related papers
- VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation.
In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline.
In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z) - Don't Forget to Connect! Improving RAG with Graph-based Reranking [26.433218248189867]
We introduce G-RAG, a reranker based on graph neural networks (GNNs) between the retriever and reader in RAG.
Our method combines both connections between documents and semantic information (via Abstract Representation Meaning graphs) to provide a context-informed ranker for RAG.
G-RAG outperforms state-of-the-art approaches while having smaller computational footprint.
arXiv Detail & Related papers (2024-05-28T17:56:46Z) - Pair then Relation: Pair-Net for Panoptic Scene Graph Generation [54.92476119356985]
Panoptic Scene Graph (PSG) aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
Current PSG methods have limited performance, which hinders downstream tasks or applications.
We present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects.
arXiv Detail & Related papers (2023-07-17T17:58:37Z) - Learning Granularity-Unified Representations for Text-to-Image Person
Re-identification [29.04254233799353]
Text-to-image person re-identification (ReID) aims to search for pedestrian images of an interested identity via textual descriptions.
Existing works usually ignore the difference in feature granularity between the two modalities.
We propose an end-to-end framework based on transformers to learn granularity-unified representations for both modalities, denoted as LGUR.
arXiv Detail & Related papers (2022-07-16T01:26:10Z) - Fashionformer: A simple, Effective and Unified Baseline for Human
Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition.
We introduce the object query for segmentation and the attribute query for attribute prediction.
For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z) - Target Adaptive Context Aggregation for Video Scene Graph Generation [36.669700084337045]
This paper deals with a challenging task of video scene graph generation (VidSGG)
We present a new em detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking.
arXiv Detail & Related papers (2021-08-18T12:46:28Z) - Cloth-Changing Person Re-identification from A Single Image with Gait
Prediction and Regularization [65.50321170655225]
We introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations.
Experiments on image-based Cloth-Changing ReID benchmarks, e.g., LTCC, PRCC, Real28, and VC-Clothes, demonstrate that GI-ReID performs favorably against the state-of-the-arts.
arXiv Detail & Related papers (2021-03-29T12:10:50Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.