Behavior Optimized Image Generation
- URL: http://arxiv.org/abs/2311.10995v1
- Date: Sat, 18 Nov 2023 07:07:38 GMT
- Title: Behavior Optimized Image Generation
- Authors: Varun Khurana, Yaman K Singla, Jayakumar Subramanian, Rajiv Ratn Shah,
Changyou Chen, Zhiqiang Xu, Balaji Krishnamurthy
- Abstract summary: We propose BoigLLM, which understands both image content and user behavior.
We show that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this task.
We release BoigBench, a benchmark dataset containing 168 million enterprise tweets with their media, brand names, time of post, and total likes.
- Score: 69.9906601767728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last few years have witnessed great success on image generation, which
has crossed the acceptance thresholds of aesthetics, making it directly
applicable to personal and commercial applications. However, images, especially
in marketing and advertising applications, are often created as a means to an
end as opposed to just aesthetic concerns. The goal can be increasing sales,
getting more clicks, likes, or image sales (in the case of stock businesses).
Therefore, the generated images need to perform well on these key performance
indicators (KPIs), in addition to being aesthetically good. In this paper, we
make the first endeavor to answer the question of "How can one infuse the
knowledge of the end-goal within the image generation process itself to create
not just better-looking images but also "better-performing'' images?''. We
propose BoigLLM, an LLM that understands both image content and user behavior.
BoigLLM knows how an image should look to get a certain required KPI. We show
that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this
task, demonstrating that while these state-of-the-art models can understand
images, they lack information on how these images perform in the real world. To
generate actual pixels of behavior-conditioned images, we train a
diffusion-based model (BoigSD) to align with a proposed BoigLLM-defined reward.
We show the performance of the overall pipeline on two datasets covering two
different behaviors: a stock dataset with the number of forward actions as the
KPI and a dataset containing tweets with the total likes as the KPI, denoted as
BoigBench. To advance research in the direction of utility-driven image
generation and understanding, we release BoigBench, a benchmark dataset
containing 168 million enterprise tweets with their media, brand account names,
time of post, and total likes.
Related papers
- Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A Comparison of Caption Size [0.0]
We show different fine-tuning methods for Stable Diffusion XL.
We also show how important it is to properly define what "high-quality" really is.
arXiv Detail & Related papers (2024-07-11T13:55:20Z) - Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain.
It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z) - xT: Nested Tokenization for Larger Context in Large Images [79.37673340393475]
xT is a framework for vision transformers which aggregates global context with local details.
We are able to increase accuracy by up to 8.6% on challenging classification tasks.
arXiv Detail & Related papers (2024-03-04T10:29:58Z) - Towards Pragmatic Semantic Image Synthesis for Urban Scenes [4.36080478413575]
We present a new task: given a dataset with synthetic images and labels and a dataset with unlabeled real images, our goal is to learn a model that can generate images with the content of the input mask and the appearance of real images.
We leverage the synthetic image as a guide to the content of the generated image by penalizing the difference between their high-level features on a patch level.
In contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches.
arXiv Detail & Related papers (2023-05-16T18:01:12Z) - Search By Image: Deeply Exploring Beneficial Features for Beauty Product
Retrieval [21.78262478923889]
This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks.
We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images.
We present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images.
arXiv Detail & Related papers (2023-03-24T15:38:58Z) - BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction.
We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches.
Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z) - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations [89.42397034542189]
We synthesize a large labeled dataset via a generative adversarial network (GAN)
We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes.
We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings.
arXiv Detail & Related papers (2022-01-12T20:28:34Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.