Recommending Themes for Ad Creative Design via Visual-Linguistic
Representations
- URL: http://arxiv.org/abs/2001.07194v2
- Date: Thu, 27 Feb 2020 23:05:46 GMT
- Title: Recommending Themes for Ad Creative Design via Visual-Linguistic
Representations
- Authors: Yichao Zhou, Shaunak Mishra, Manisha Verma, Narayan Bhamidipati, and
Wei Wang
- Abstract summary: We propose a theme (keyphrase) recommender system for ad creative strategists.
The theme recommender is based on aggregating results from a visual question answering (VQA) task.
We show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics.
- Score: 27.13752835161338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a perennial need in the online advertising industry to refresh ad
creatives, i.e., images and text used for enticing online users towards a
brand. Such refreshes are required to reduce the likelihood of ad fatigue among
online users, and to incorporate insights from other successful campaigns in
related product categories. Given a brand, to come up with themes for a new ad
is a painstaking and time consuming process for creative strategists.
Strategists typically draw inspiration from the images and text used for past
ad campaigns, as well as world knowledge on the brands. To automatically infer
ad themes via such multimodal sources of information in past ad campaigns, we
propose a theme (keyphrase) recommender system for ad creative strategists. The
theme recommender is based on aggregating results from a visual question
answering (VQA) task, which ingests the following: (i) ad images, (ii) text
associated with the ads as well as Wikipedia pages on the brands in the ads,
and (iii) questions around the ad. We leverage transformer based cross-modality
encoders to train visual-linguistic representations for our VQA task. We study
two formulations for the VQA task along the lines of classification and
ranking; via experiments on a public dataset, we show that cross-modal
representations lead to significantly better classification accuracy and
ranking precision-recall metrics. Cross-modal representations show better
performance compared to separate image and text representations. In addition,
the use of multimodal information shows a significant lift over using only
textual or visual information.
Related papers
- ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising [2.330164376631038]
Contextual advertising serves ads that are aligned to the content that the user is viewing.
Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources.
We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising.
arXiv Detail & Related papers (2024-10-29T17:01:05Z) - Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Contextual AD Narration with Interleaved Multimodal Sequence [50.240534605090396]
The task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie.
With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name.
We propose to leverage pre-trained foundation models through a simple and unified framework to generate ADs.
arXiv Detail & Related papers (2024-03-19T17:27:55Z) - AdSEE: Investigating the Impact of Image Style Editing on Advertisement
Attractiveness [25.531489722164178]
We propose Advertisement Style Editing and Attractiveness Enhancement (AdSEE), which explores whether semantic editing to ads images can affect or alter the popularity of online advertisements.
We introduce StyleGAN-based facial semantic editing and inversion to ads images and train a click rate predictor attributing GAN-based face latent representations to click rates.
Online A/B tests performed over a period of 5 days have verified the increased click-through rates of AdSEE-edited samples as compared to a control group of original ads.
arXiv Detail & Related papers (2023-09-15T04:52:49Z) - KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature
Adaptation of Vision-Language Models [40.54372699488922]
We perform the first empirical study of image ad understanding through the lens of pre-trained vision-language models (VLMs)
We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities.
arXiv Detail & Related papers (2023-05-28T04:49:01Z) - Boost CTR Prediction for New Advertisements via Modeling Visual Content [55.11267821243347]
We exploit the visual content in ads to boost the performance of CTR prediction models.
We learn the embedding for each visual ID based on the historical user-ad interactions accumulated in the past.
After incorporating the visual ID embedding in the CTR prediction model of Baidu online advertising, the average CTR of ads improves by 1.46%, and the total charge increases by 1.10%.
arXiv Detail & Related papers (2022-09-23T17:08:54Z) - Persuasion Strategies in Advertisements [68.70313043201882]
We introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies.
We then formulate the task of persuasion strategy prediction with multi-modal learning.
We conduct a real-world case study on 1600 advertising campaigns of 30 Fortune-500 companies.
arXiv Detail & Related papers (2022-08-20T07:33:13Z) - A Multimodal Framework for Video Ads Understanding [64.70769354696019]
We develop a multimodal system to improve the ability of structured analysis of advertising video content.
Our solution achieved a score of 0.2470 measured in consideration of localization and prediction accuracy, ranking fourth in the 2021 TAAC final leaderboard.
arXiv Detail & Related papers (2021-08-29T16:06:00Z) - Multi-Channel Sequential Behavior Networks for User Modeling in Online
Advertising [4.964012641964141]
This paper presents Multi-Channel Sequential Behavior Network (MC-SBN), a deep learning approach for embedding users and ads in a semantic space.
Our proposed user encoder architecture summarizes user activities from multiple input channels--such as previous search queries, visited pages, or clicked ads--into a user vector.
The results demonstrate that MC-SBN can improve the ranking of relevant ads and boost the performance of both click prediction and conversion prediction.
arXiv Detail & Related papers (2020-12-27T06:13:29Z) - Cross-Media Keyphrase Prediction: A Unified Framework with
Multi-Modality Multi-Head Attention and Image Wordings [63.79979145520512]
We explore the joint effects of texts and images in predicting the keyphrases for a multimedia post.
We propose a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions.
Our model significantly outperforms the previous state of the art based on traditional attention networks.
arXiv Detail & Related papers (2020-11-03T08:44:18Z) - Learning to Create Better Ads: Generation and Ranking Approaches for Ad
Creative Refinement [26.70647666598025]
We study approaches to refine the given ad text and image by: (i) generating new ad text, (ii) recommending keyphrases for new ad text, and (iii) recommending image tags (objects in image)
Based on A/B tests conducted by multiple advertisers, we form pairwise examples of inferior and superior ad creatives.
We also share broadly applicable insights from our experiments using data from the Yahoo Gemini ad platform.
arXiv Detail & Related papers (2020-08-17T16:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.