Automatic Generation of Product-Image Sequence in E-commerce
- URL: http://arxiv.org/abs/2206.12994v1
- Date: Sun, 26 Jun 2022 23:38:42 GMT
- Title: Automatic Generation of Product-Image Sequence in E-commerce
- Authors: Xiaochuan Fan, Chi Zhang, Yong Yang, Yue Shang, Xueying Zhang, Zhen
He, Yun Xiao, Bo Long, Lingfei Wu
- Abstract summary: Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
- Score: 46.06263129000091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Product images are essential for providing desirable user experience in an
e-commerce platform. For a platform with billions of products, it is extremely
time-costly and labor-expensive to manually pick and organize qualified images.
Furthermore, there are the numerous and complicated image rules that a product
image needs to comply in order to be generated/selected. To address these
challenges, in this paper, we present a new learning framework in order to
achieve Automatic Generation of Product-Image Sequence (AGPIS) in e-commerce.
To this end, we propose a Multi-modality Unified Image-sequence Classifier
(MUIsC), which is able to simultaneously detect all categories of rule
violations through learning. MUIsC leverages textual review feedback as the
additional training target and utilizes product textual description to provide
extra semantic information. Based on offline evaluations, we show that the
proposed MUIsC significantly outperforms various baselines. Besides MUIsC, we
also integrate some other important modules in the proposed framework, such as
primary image selection, noncompliant content detection, and image
deduplication. With all these modules, our framework works effectively and
efficiently in JD.com recommendation platform. By Dec 2021, our AGPIS framework
has generated high-standard images for about 1.5 million products and achieves
13.6% in reject rate.
Related papers
- Multi-modal Generation via Cross-Modal In-Context Learning [50.45304937804883]
We propose a Multi-modal Generation via Cross-Modal In-Context Learning (MGCC) method that generates novel images from complex multimodal prompt sequences.
Our MGCC demonstrates a diverse range of multimodal capabilities, like novel image generation, the facilitation of multimodal dialogue, and generation of texts.
arXiv Detail & Related papers (2024-05-28T15:58:31Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce [20.921870288665627]
Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features.
MIEM has become an integral part of the Shopee image search platform.
arXiv Detail & Related papers (2023-11-29T08:09:50Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Mutual Query Network for Multi-Modal Product Image Segmentation [13.192334066413837]
We propose a mutual query network to segment products based on both visual and linguistic modalities.
To promote the research in this field, we also construct a Multi-Modal Product dataset (MMPS)
The proposed method significantly outperforms the state-of-the-art methods on MMPS.
arXiv Detail & Related papers (2023-06-26T03:18:38Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame
Filtration for Automatic Retail Checkout [0.7250756081498245]
We propose to segment and classify individual frames from a video sequence.
The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking.
Our best system achieves 3rd place in the AI City Challenge 2022 Track 4 with an F1 score of 0.4545.
arXiv Detail & Related papers (2022-04-23T08:54:28Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - eProduct: A Million-Scale Visual Search Benchmark to Address Product
Recognition Challenges [8.204924070199866]
eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting.
We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development.
We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
arXiv Detail & Related papers (2021-07-13T05:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.