Staging E-Commerce Products for Online Advertising using Retrieval
Assisted Image Generation
- URL: http://arxiv.org/abs/2307.15326v1
- Date: Fri, 28 Jul 2023 06:04:46 GMT
- Title: Staging E-Commerce Products for Online Advertising using Retrieval
Assisted Image Generation
- Authors: Yueh-Ning Ku, Mikhail Kuznetsov, Shaunak Mishra and Paloma de Juan
- Abstract summary: We propose a generative adversarial network (GAN) based approach to generate staged backgrounds for un-staged product images.
We show how our staging approach can enable animations of moving products leading to a video ad from a product image.
- Score: 11.03803158931361
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Online ads showing e-commerce products typically rely on the product images
in a catalog sent to the advertising platform by an e-commerce platform. In the
broader ads industry such ads are called dynamic product ads (DPA). It is
common for DPA catalogs to be in the scale of millions (corresponding to the
scale of products which can be bought from the e-commerce platform). However,
not all product images in the catalog may be appealing when directly
re-purposed as an ad image, and this may lead to lower click-through rates
(CTRs). In particular, products just placed against a solid background may not
be as enticing and realistic as a product staged in a natural environment. To
address such shortcomings of DPA images at scale, we propose a generative
adversarial network (GAN) based approach to generate staged backgrounds for
un-staged product images. Generating the entire staged background is a
challenging task susceptible to hallucinations. To get around this, we
introduce a simpler approach called copy-paste staging using retrieval assisted
GANs. In copy paste staging, we first retrieve (from the catalog) staged
products similar to the un-staged input product, and then copy-paste the
background of the retrieved product in the input image. A GAN based in-painting
model is used to fill the holes left after this copy-paste operation. We show
the efficacy of our copy-paste staging method via offline metrics, and human
evaluation. In addition, we show how our staging approach can enable animations
of moving products leading to a video ad from a product image.
Related papers
- CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.
To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)
Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - PAID: A Framework of Product-Centric Advertising Image Design [31.08944590096747]
We propose a novel framework called Product-Centric Advertising Image Design (PAID)
It consists of four sequential stages to highlight product foregrounds and taglines while achieving overall image aesthetics.
To support the PAID framework, we create corresponding datasets with over 50,000 labeled images.
arXiv Detail & Related papers (2025-01-24T08:21:35Z) - Automated Virtual Product Placement and Assessment in Images using Diffusion Models [1.63075356372232]
This paper introduces a novel three-stage fully automated VPP system.
In the first stage, a language-guided image segmentation model identifies optimal regions within images for product inpainting.
In the second stage, Stable Diffusion (SD), fine-tuned with a few example product images, is used to inpaint the product into the previously identified candidate regions.
The final stage introduces an "Alignment Module", which is designed to effectively sieve out low-quality images.
arXiv Detail & Related papers (2024-05-02T09:44:13Z) - Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images [45.302905684461905]
This paper studies the possibility of employing multi-modal models with enhanced visual understanding to mimic the outputs of platforms like DALL-E 3 and Midjourney.
We create prompts that generate images similar to those available in marketplaces and from premium stock image providers, yet at a markedly lower expense.
Our findings, supported by both automated metrics and human assessment, reveal that comparable visual content can be produced for a fraction of the prevailing market prices.
arXiv Detail & Related papers (2024-04-21T21:30:17Z) - Cross-Domain Product Representation Learning for Rich-Content E-Commerce [16.418118040661646]
This paper introduces a large-scale cRoss-dOmain Product Ecognition dataset, called ROPE.
ROPE covers a wide range of product categories and contains over 180,000 products, corresponding to millions of short videos and live streams.
It is the first dataset to cover product pages, short videos, and live streams simultaneously, providing the basis for establishing a unified product representation across different media domains.
arXiv Detail & Related papers (2023-08-10T13:06:05Z) - DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [55.58582254514431]
We propose DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech.
We also introduce pose modelling in speech2latent for pose controllability.
Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness.
arXiv Detail & Related papers (2023-03-30T17:18:31Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - Unposed: Unsupervised Pose Estimation based Product Image
Recommendations [4.467248776406006]
We propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones.
The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases.
We surveyed 200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(20K images) of which a significant proportion had scope for adding many image variants.
arXiv Detail & Related papers (2023-01-19T05:02:55Z) - Boost CTR Prediction for New Advertisements via Modeling Visual Content [55.11267821243347]
We exploit the visual content in ads to boost the performance of CTR prediction models.
We learn the embedding for each visual ID based on the historical user-ad interactions accumulated in the past.
After incorporating the visual ID embedding in the CTR prediction model of Baidu online advertising, the average CTR of ads improves by 1.46%, and the total charge increases by 1.10%.
arXiv Detail & Related papers (2022-09-23T17:08:54Z) - Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z) - Poet: Product-oriented Video Captioner for E-commerce [124.9936946822493]
In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting.
We propose a product-oriented video captioner framework, abbreviated as Poet.
We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity.
arXiv Detail & Related papers (2020-08-16T10:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.