Related papers: Unposed: Unsupervised Pose Estimation based Product Image Recommendations

Unposed: Unsupervised Pose Estimation based Product Image Recommendations

URL: http://arxiv.org/abs/2301.07879v1
Date: Thu, 19 Jan 2023 05:02:55 GMT
Title: Unposed: Unsupervised Pose Estimation based Product Image Recommendations
Authors: Saurabh Sharma, Faizan Ahemad
Abstract summary: We propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We surveyed 200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(20K images) of which a significant proportion had scope for adding many image variants.
Score: 4.467248776406006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Product images are the most impressing medium of customer interaction on the product detail pages of e-commerce websites. Millions of products are onboarded on to webstore catalogues daily and maintaining a high quality bar for a product's set of images is a problem at scale. Grouping products by categories, clothing is a very high volume and high velocity category and thus deserves its own attention. Given the scale it is challenging to monitor the completeness of image set, which adequately details the product for the consumers, which in turn often leads to a poor customer experience and thus customer drop off. To supervise the quality and completeness of the images in the product pages for these product types and suggest improvements, we propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We first create a reference image set of popular products with wholesome imageset. Then we create clusters of images to label most desirable poses to form the classes for the reference set from these ideal products set. Further, for all test products we scan the images for all desired pose classes w.r.t. reference set poses, determine the missing ones and sort them in the order of potential impact. These missing poses can further be used by the sellers to add enriched product listing image. We gathered data from popular online webstore and surveyed ~200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(~20K images) of which a significant proportion had scope for adding many image variants as compared to high rated products which had more than double image variants, indicating that our model can potentially be used on a large scale.

Related papers

CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL) Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z)
An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency [4.177224329586615]
In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task. Human Feedback and Product Consistency (HFPC) can automatically assess the generated product images based on two modules. HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models.
arXiv Detail & Related papers (2024-12-23T12:03:35Z)
Low-Biased General Annotated Dataset Generation [62.04202037186855]
We present a low-biased general annotated dataset generation framework (lbGen) Instead of expensive manual collection, we aim at directly generating low-biased images with category annotations. Experimental results confirm that, compared with the manually labeled dataset or other synthetic datasets, the utilization of our generated low-biased dataset leads to stable generalization capacity enhancement.
arXiv Detail & Related papers (2024-12-14T13:28:40Z)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. We introduce the MIMEX dataset, comprising 28 distinct product categories. We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z)
Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce [20.921870288665627]
Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features. MIEM has become an integral part of the Shopee image search platform.
arXiv Detail & Related papers (2023-11-29T08:09:50Z)
Behavior Optimized Image Generation [69.9906601767728]
We propose BoigLLM, which understands both image content and user behavior. We show that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this task. We release BoigBench, a benchmark dataset containing 168 million enterprise tweets with their media, brand names, time of post, and total likes.
arXiv Detail & Related papers (2023-11-18T07:07:38Z)
Product Review Image Ranking for Fashion E-commerce [0.0]
We train our network to rank bad-quality images lower than high-quality ones. Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.
arXiv Detail & Related papers (2023-08-10T07:09:13Z)
Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations. By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z)
Weakly Supervised High-Fidelity Clothing Model Generation [67.32235668920192]
We propose a cheap yet scalable weakly-supervised method called Deep Generative Projection (DGP) to address this specific scenario. We show that projecting the rough alignment of clothing and body onto the StyleGAN space can yield photo-realistic wearing results.
arXiv Detail & Related papers (2021-12-14T07:15:15Z)
An Automatic Image Content Retrieval Method for better Mobile Device Display User Experiences [91.3755431537592]
A new mobile application for image content retrieval and classification for mobile device display is proposed. The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.
arXiv Detail & Related papers (2021-08-26T23:44:34Z)
eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges [8.204924070199866]
eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development. We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
arXiv Detail & Related papers (2021-07-13T05:28:34Z)
Vision-based Price Suggestion for Online Second-hand Items [40.42940050851797]
We present a vision-based price suggestion system for the online second-hand item shopping platform. The goal of vision-based price suggestion is to help sellers set effective prices for their second-hand listings with the images uploaded to the online platforms.
arXiv Detail & Related papers (2020-12-10T22:56:29Z)
Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances. The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.