Automatic Generation of Product-Image Sequence in E-commerce
- URL: http://arxiv.org/abs/2206.12994v1
- Date: Sun, 26 Jun 2022 23:38:42 GMT
- Title: Automatic Generation of Product-Image Sequence in E-commerce
- Authors: Xiaochuan Fan, Chi Zhang, Yong Yang, Yue Shang, Xueying Zhang, Zhen
He, Yun Xiao, Bo Long, Lingfei Wu
- Abstract summary: Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
- Score: 46.06263129000091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Product images are essential for providing desirable user experience in an
e-commerce platform. For a platform with billions of products, it is extremely
time-costly and labor-expensive to manually pick and organize qualified images.
Furthermore, there are the numerous and complicated image rules that a product
image needs to comply in order to be generated/selected. To address these
challenges, in this paper, we present a new learning framework in order to
achieve Automatic Generation of Product-Image Sequence (AGPIS) in e-commerce.
To this end, we propose a Multi-modality Unified Image-sequence Classifier
(MUIsC), which is able to simultaneously detect all categories of rule
violations through learning. MUIsC leverages textual review feedback as the
additional training target and utilizes product textual description to provide
extra semantic information. Based on offline evaluations, we show that the
proposed MUIsC significantly outperforms various baselines. Besides MUIsC, we
also integrate some other important modules in the proposed framework, such as
primary image selection, noncompliant content detection, and image
deduplication. With all these modules, our framework works effectively and
efficiently in JD.com recommendation platform. By Dec 2021, our AGPIS framework
has generated high-standard images for about 1.5 million products and achieves
13.6% in reject rate.
Related papers
- CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.
To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)
Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task.
Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes.
Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z) - Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment [53.45813302866466]
We present ISG, a comprehensive evaluation framework for interleaved text-and-image generation.
ISG evaluates responses on four levels of granularity: holistic, structural, block-level, and image-specific.
In conjunction with ISG, we introduce a benchmark, ISG-Bench, encompassing 1,150 samples across 8 categories and 21 subcategories.
arXiv Detail & Related papers (2024-11-26T07:55:57Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - MVAM: Multi-View Attention Method for Fine-grained Image-Text Matching [65.87255122130188]
We propose a Multi-view Attention Method (MVAM) for image-text matching.
We also incorporate an objective to explicitly encourage attention heads to focus on distinct aspects of the input data.
Our method allows models to encode images and text from different perspectives and focus on more critical details, leading to better matching performance.
arXiv Detail & Related papers (2024-02-27T06:11:54Z) - Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce [20.921870288665627]
Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features.
MIEM has become an integral part of the Shopee image search platform.
arXiv Detail & Related papers (2023-11-29T08:09:50Z) - Mutual Query Network for Multi-Modal Product Image Segmentation [13.192334066413837]
We propose a mutual query network to segment products based on both visual and linguistic modalities.
To promote the research in this field, we also construct a Multi-Modal Product dataset (MMPS)
The proposed method significantly outperforms the state-of-the-art methods on MMPS.
arXiv Detail & Related papers (2023-06-26T03:18:38Z) - VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame
Filtration for Automatic Retail Checkout [0.7250756081498245]
We propose to segment and classify individual frames from a video sequence.
The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking.
Our best system achieves 3rd place in the AI City Challenge 2022 Track 4 with an F1 score of 0.4545.
arXiv Detail & Related papers (2022-04-23T08:54:28Z) - eProduct: A Million-Scale Visual Search Benchmark to Address Product
Recognition Challenges [8.204924070199866]
eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting.
We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development.
We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
arXiv Detail & Related papers (2021-07-13T05:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.