Unposed: Unsupervised Pose Estimation based Product Image
Recommendations
- URL: http://arxiv.org/abs/2301.07879v1
- Date: Thu, 19 Jan 2023 05:02:55 GMT
- Title: Unposed: Unsupervised Pose Estimation based Product Image
Recommendations
- Authors: Saurabh Sharma, Faizan Ahemad
- Abstract summary: We propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones.
The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases.
We surveyed 200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(20K images) of which a significant proportion had scope for adding many image variants.
- Score: 4.467248776406006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Product images are the most impressing medium of customer interaction on the
product detail pages of e-commerce websites. Millions of products are onboarded
on to webstore catalogues daily and maintaining a high quality bar for a
product's set of images is a problem at scale. Grouping products by categories,
clothing is a very high volume and high velocity category and thus deserves its
own attention. Given the scale it is challenging to monitor the completeness of
image set, which adequately details the product for the consumers, which in
turn often leads to a poor customer experience and thus customer drop off.
To supervise the quality and completeness of the images in the product pages
for these product types and suggest improvements, we propose a Human Pose
Detection based unsupervised method to scan the image set of a product for the
missing ones. The unsupervised approach suggests a fair approach to sellers
based on product and category irrespective of any biases. We first create a
reference image set of popular products with wholesome imageset. Then we create
clusters of images to label most desirable poses to form the classes for the
reference set from these ideal products set. Further, for all test products we
scan the images for all desired pose classes w.r.t. reference set poses,
determine the missing ones and sort them in the order of potential impact.
These missing poses can further be used by the sellers to add enriched product
listing image. We gathered data from popular online webstore and surveyed ~200
products manually, a large fraction of which had at least 1 repeated image or
missing variant, and sampled 3K products(~20K images) of which a significant
proportion had scope for adding many image variants as compared to high rated
products which had more than double image variants, indicating that our model
can potentially be used on a large scale.
Related papers
- CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.
To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)
Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency [4.177224329586615]
In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task.
Human Feedback and Product Consistency (HFPC) can automatically assess the generated product images based on two modules.
HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models.
arXiv Detail & Related papers (2024-12-23T12:03:35Z) - Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce [20.921870288665627]
Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features.
MIEM has become an integral part of the Shopee image search platform.
arXiv Detail & Related papers (2023-11-29T08:09:50Z) - Behavior Optimized Image Generation [69.9906601767728]
We propose BoigLLM, which understands both image content and user behavior.
We show that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this task.
We release BoigBench, a benchmark dataset containing 168 million enterprise tweets with their media, brand names, time of post, and total likes.
arXiv Detail & Related papers (2023-11-18T07:07:38Z) - Product Review Image Ranking for Fashion E-commerce [0.0]
We train our network to rank bad-quality images lower than high-quality ones.
Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.
arXiv Detail & Related papers (2023-08-10T07:09:13Z) - Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z) - An Automatic Image Content Retrieval Method for better Mobile Device
Display User Experiences [91.3755431537592]
A new mobile application for image content retrieval and classification for mobile device display is proposed.
The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.
arXiv Detail & Related papers (2021-08-26T23:44:34Z) - Vision-based Price Suggestion for Online Second-hand Items [40.42940050851797]
We present a vision-based price suggestion system for the online second-hand item shopping platform.
The goal of vision-based price suggestion is to help sellers set effective prices for their second-hand listings with the images uploaded to the online platforms.
arXiv Detail & Related papers (2020-12-10T22:56:29Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.