FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion
Vision-Language Pre-training
- URL: http://arxiv.org/abs/2304.05051v1
- Date: Tue, 11 Apr 2023 08:20:17 GMT
- Title: FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion
Vision-Language Pre-training
- Authors: Yunpeng Han, Lisai Zhang, Qingcai Chen, Zhijian Chen, Zhonghua Li,
Jianxin Yang, Zhao Cao
- Abstract summary: We propose a method for fine-grained fashion vision-language pre-training based on fashion Symbols and Attributes Prompt (FashionSAP)
Firstly, we propose the fashion symbols, a novel abstract fashion concept layer, to represent different fashion items.
Secondly, the attributes prompt method is proposed to make the model learn specific attributes of fashion items explicitly.
- Score: 12.652002299515864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fashion vision-language pre-training models have shown efficacy for a wide
range of downstream tasks. However, general vision-language pre-training models
pay less attention to fine-grained domain features, while these features are
important in distinguishing the specific domain tasks from general tasks. We
propose a method for fine-grained fashion vision-language pre-training based on
fashion Symbols and Attributes Prompt (FashionSAP) to model fine-grained
multi-modalities fashion attributes and characteristics. Firstly, we propose
the fashion symbols, a novel abstract fashion concept layer, to represent
different fashion items and to generalize various kinds of fine-grained fashion
features, making modelling fine-grained attributes more effective. Secondly,
the attributes prompt method is proposed to make the model learn specific
attributes of fashion items explicitly. We design proper prompt templates
according to the format of fashion data. Comprehensive experiments are
conducted on two public fashion benchmarks, i.e., FashionGen and FashionIQ, and
FashionSAP gets SOTA performances for four popular fashion tasks. The ablation
study also shows the proposed abstract fashion symbols, and the attribute
prompt method enables the model to acquire fine-grained semantics in the
fashion domain effectively. The obvious performance gains from FashionSAP
provide a new baseline for future fashion task research.
Related papers
- Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models [1.8817715864806608]
This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation.
Emphasizing adaptability in AI-driven fashion creativity, we focus on prompting techniques, such as zero-shot and few-shot learning.
Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles.
arXiv Detail & Related papers (2024-07-20T17:37:51Z) - FashionReGen: LLM-Empowered Fashion Report Generation [61.84580616045145]
We propose an intelligent Fashion Analyzing and Reporting system based on advanced Large Language Models (LLMs)
Specifically, it tries to deliver FashionReGen based on effective catwalk analysis, which is equipped with several key procedures.
It also inspires the explorations of more high-level tasks with industrial significance in other domains.
arXiv Detail & Related papers (2024-03-11T12:29:35Z) - Social Media Fashion Knowledge Extraction as Captioning [61.41631195195498]
We study the task of social media fashion knowledge extraction.
We transform the fashion knowledges into a natural language caption with a sentence transformation method.
Our framework then aims to generate the sentence-based fashion knowledge directly from the social media post.
arXiv Detail & Related papers (2023-09-28T09:07:48Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - Personalized Fashion Recommendation from Personal Social Media Data: An
Item-to-Set Metric Learning Approach [71.63618051547144]
We study the problem of personalized fashion recommendation from social media data.
We present an item-to-set metric learning framework that learns to compute the similarity between a set of historical fashion items of a user to a new fashion item.
To validate the effectiveness of our approach, we collect a real-world social media dataset.
arXiv Detail & Related papers (2020-05-25T23:24:24Z) - FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal
Retrieval [31.822218310945036]
FashionBERT learns high level representations of texts and images.
FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-20T00:41:00Z) - Knowledge Enhanced Neural Fashion Trend Forecasting [81.2083786318119]
This work focuses on investigating fine-grained fashion element trends for specific user groups.
We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information.
We propose a Knowledge EnhancedRecurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time-series data.
arXiv Detail & Related papers (2020-05-07T07:42:17Z) - Fashionpedia: Ontology, Segmentation, and an Attribute Localization
Dataset [62.77342894987297]
We propose a novel Attribute-Mask RCNN model to jointly perform instance segmentation and localized attribute recognition.
We also demonstrate instance segmentation models pre-trained on Fashionpedia achieve better transfer learning performance on other fashion datasets than ImageNet pre-training.
arXiv Detail & Related papers (2020-04-26T02:38:26Z) - Fashion Meets Computer Vision: A Survey [41.41993143419999]
This paper provides a comprehensive survey of more than 200 major fashion-related works covering four main aspects for enabling intelligent fashion.
For each task, the benchmark datasets and the evaluation protocols are summarized.
arXiv Detail & Related papers (2020-03-31T07:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.