Related papers: Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

URL: http://arxiv.org/abs/2001.07194v2
Date: Thu, 27 Feb 2020 23:05:46 GMT
Title: Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
Authors: Yichao Zhou, Shaunak Mishra, Manisha Verma, Narayan Bhamidipati, and Wei Wang
Abstract summary: We propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task. We show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics.
Score: 27.13752835161338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.

Related papers

BannerAgency: Advertising Banner Design with Multimodal LLM Agents [4.337357639279586]
This paper introduces a training-free framework for fully automated banner ad design creation. We present BannerAgency, an MLLM agent system that collaborates with advertisers to understand their brand identity and banner objectives. It generates matching background images, creates blueprints for foreground design elements, and renders the final creatives as editable components in Figma or SVG formats.
arXiv Detail & Related papers (2025-03-14T03:54:05Z)
CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL) Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z)
Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task. Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes. Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z)
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising [2.330164376631038]
Contextual advertising serves ads that are aligned to the content that the user is viewing. Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources. We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising.
arXiv Detail & Related papers (2024-10-29T17:01:05Z)
Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing. We propose an autoregressive voken generation method, named AVG. We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z)
Contextual AD Narration with Interleaved Multimodal Sequence [50.240534605090396]
The task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name. We propose to leverage pre-trained foundation models through a simple and unified framework to generate ADs.
arXiv Detail & Related papers (2024-03-19T17:27:55Z)
AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness [25.531489722164178]
We propose Advertisement Style Editing and Attractiveness Enhancement (AdSEE), which explores whether semantic editing to ads images can affect or alter the popularity of online advertisements. We introduce StyleGAN-based facial semantic editing and inversion to ads images and train a click rate predictor attributing GAN-based face latent representations to click rates. Online A/B tests performed over a period of 5 days have verified the increased click-through rates of AdSEE-edited samples as compared to a control group of original ads.
arXiv Detail & Related papers (2023-09-15T04:52:49Z)
Long-Term Ad Memorability: Understanding & Generating Memorable Ads [54.23854539909078]
Despite the importance of long-term memory in marketing and brand building, until now, there has been no large-scale study on the memorability of ads. We release the first memorability dataset, LAMBDA, consisting of 1749 participants and 2205 ads covering 276 brands. Running statistical tests over different participant subpopulations and ad types, we find many interesting insights into what makes an ad memorable, e.g., fast-moving ads are more memorable than those with slower scenes. We present a scalable method to build a high-quality memorable ad generation model by leveraging automatically annotated data.
arXiv Detail & Related papers (2023-09-01T10:27:04Z)
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models [40.54372699488922]
We perform the first empirical study of image ad understanding through the lens of pre-trained vision-language models (VLMs) We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities.
arXiv Detail & Related papers (2023-05-28T04:49:01Z)
Boost CTR Prediction for New Advertisements via Modeling Visual Content [55.11267821243347]
We exploit the visual content in ads to boost the performance of CTR prediction models. We learn the embedding for each visual ID based on the historical user-ad interactions accumulated in the past. After incorporating the visual ID embedding in the CTR prediction model of Baidu online advertising, the average CTR of ads improves by 1.46%, and the total charge increases by 1.10%.
arXiv Detail & Related papers (2022-09-23T17:08:54Z)
Persuasion Strategies in Advertisements [68.70313043201882]
We introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. We then formulate the task of persuasion strategy prediction with multi-modal learning. We conduct a real-world case study on 1600 advertising campaigns of 30 Fortune-500 companies.
arXiv Detail & Related papers (2022-08-20T07:33:13Z)
A Multimodal Framework for Video Ads Understanding [64.70769354696019]
We develop a multimodal system to improve the ability of structured analysis of advertising video content. Our solution achieved a score of 0.2470 measured in consideration of localization and prediction accuracy, ranking fourth in the 2021 TAAC final leaderboard.
arXiv Detail & Related papers (2021-08-29T16:06:00Z)
Multi-Channel Sequential Behavior Networks for User Modeling in Online Advertising [4.964012641964141]
This paper presents Multi-Channel Sequential Behavior Network (MC-SBN), a deep learning approach for embedding users and ads in a semantic space. Our proposed user encoder architecture summarizes user activities from multiple input channels--such as previous search queries, visited pages, or clicked ads--into a user vector. The results demonstrate that MC-SBN can improve the ranking of relevant ads and boost the performance of both click prediction and conversion prediction.
arXiv Detail & Related papers (2020-12-27T06:13:29Z)
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings [63.79979145520512]
We explore the joint effects of texts and images in predicting the keyphrases for a multimedia post. We propose a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions. Our model significantly outperforms the previous state of the art based on traditional attention networks.
arXiv Detail & Related papers (2020-11-03T08:44:18Z)
Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement [26.70647666598025]
We study approaches to refine the given ad text and image by: (i) generating new ad text, (ii) recommending keyphrases for new ad text, and (iii) recommending image tags (objects in image) Based on A/B tests conducted by multiple advertisers, we form pairwise examples of inferior and superior ad creatives. We also share broadly applicable insights from our experiments using data from the Yahoo Gemini ad platform.
arXiv Detail & Related papers (2020-08-17T16:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.