Related papers: Creative4U: MLLMs-based Advertising Creative Image Selector with Comparative Reasoning

Creative4U: MLLMs-based Advertising Creative Image Selector with Comparative Reasoning

URL: http://arxiv.org/abs/2508.12628v1
Date: Mon, 18 Aug 2025 05:11:30 GMT
Title: Creative4U: MLLMs-based Advertising Creative Image Selector with Comparative Reasoning
Authors: Yukang Lin, Xiang Zhang, Shichang Jia, Bowen Wan, Chenghan Fu, Xudong Ren, Yueran Liu, Wanxian Guan, Pengji Wang, Jian Xu, Bo Zheng, Baolin Liu,
Abstract summary: We propose the first paradigm for explainable creative assessment and selection.<n>Powered by multimodal large language models (MLLMs), our approach integrates the assessment and selection of creative images into a natural language task.<n>Our code and dataset will be made public to advance research and industrial applications.
Score: 17.0088513334658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creative image in advertising is the heart and soul of e-commerce platform. An eye-catching creative image can enhance the shopping experience for users, boosting income for advertisers and advertising revenue for platforms. With the advent of AIGC technology, advertisers can produce large quantities of creative images at minimal cost. However, they struggle to assess the creative quality to select. Existing methods primarily focus on creative ranking, which fails to address the need for explainable creative selection. In this work, we propose the first paradigm for explainable creative assessment and selection. Powered by multimodal large language models (MLLMs), our approach integrates the assessment and selection of creative images into a natural language generation task. To facilitate this research, we construct CreativePair, the first comparative reasoning-induced creative dataset featuring 8k annotated image pairs, with each sample including a label indicating which image is superior. Additionally, we introduce Creative4U (pronounced Creative for You), a MLLMs-based creative selector that takes into account users' interests. Through Reason-to-Select RFT, which includes supervised fine-tuning with Chain-of-Thought (CoT-SFT) and Group Relative Policy Optimization (GRPO) based reinforcement learning, Creative4U is able to evaluate and select creative images accurately. Both offline and online experiments demonstrate the effectiveness of our approach. Our code and dataset will be made public to advance research and industrial applications.

Related papers

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z)
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations [53.950760059792614]
Large Language Models (LLMs) excel at countless tasks, yet struggle with creativity.<n>We introduce a novel approach that couples LLMs with structured representations and cognitively inspired manipulations to generate more creative and diverse ideas.<n>We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z)
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM [58.42678619252968]
Creation-MMBench is a benchmark designed to evaluate the creative capabilities of Multimodal Large Language Models.<n>The benchmark comprises 765 test cases spanning 51 fine-grained tasks.<n> Experimental results reveal that open-source MLLMs significantly underperform compared to proprietary models in creative tasks.
arXiv Detail & Related papers (2025-03-18T17:51:34Z)
CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.<n>To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)<n>Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion Model [8.945197427679924]
Traditional AI-based approaches face the same problem of not considering user information while having limited aesthetic knowledge from designers. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. This paper proposes a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage.
arXiv Detail & Related papers (2024-01-17T03:27:39Z)
Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems [20.78133992969317]
We propose for the first time a novel architecture for online parallel estimation of ads and creatives ranking. The online architecture enables sophisticated personalized creative modeling while reducing overall latency. The offline joint model for CTR estimation allows mutual awareness and collaborative optimization between ads and creatives.
arXiv Detail & Related papers (2023-12-20T04:05:21Z)
Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising [16.527943807941856]
This paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element. Experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics.
arXiv Detail & Related papers (2023-07-04T09:32:39Z)
Towards Creativity Characterization of Generative Models via Group-based Subset Scanning [64.6217849133164]
We propose group-based subset scanning to identify, quantify, and characterize creative processes. We find that creative samples generate larger subsets of anomalies than normal or non-creative samples across datasets.
arXiv Detail & Related papers (2022-03-01T15:07:14Z)
Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure [24.13017090236483]
Ad creatives with enjoyable visual appearance may increase the click-through rate (CTR) of products. We propose an Adaptive and Efficient ad creative Selection framework based on a tree structure. Based on the tree structure, Thompson sampling is adapted with dynamic programming, leading to efficient exploration for potential ad creatives with the largest CTR.
arXiv Detail & Related papers (2021-03-02T03:39:41Z)
Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement [26.70647666598025]
We study approaches to refine the given ad text and image by: (i) generating new ad text, (ii) recommending keyphrases for new ad text, and (iii) recommending image tags (objects in image) Based on A/B tests conducted by multiple advertisers, we form pairwise examples of inferior and superior ad creatives. We also share broadly applicable insights from our experiments using data from the Yahoo Gemini ad platform.
arXiv Detail & Related papers (2020-08-17T16:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.