MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model
- URL: http://arxiv.org/abs/2410.16840v1
- Date: Tue, 22 Oct 2024 09:20:03 GMT
- Title: MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model
- Authors: Meng Xu, Tong Zhang, Fuyun Wang, Yi Lei, Xin Liu, Zhen Cui,
- Abstract summary: Movie posters are vital for captivating audiences, conveying themes, and driving market competition in the film industry.
Despite exciting progress in image generation, current models often fall short in producing satisfactory poster results.
We propose a Movie Posters DataSet (MPDS), tailored for text-to-image generation models to revolutionize poster production.
- Score: 26.361736240401594
- License:
- Abstract: Movie posters are vital for captivating audiences, conveying themes, and driving market competition in the film industry. While traditional designs are laborious, intelligent generation technology offers efficiency gains and design enhancements. Despite exciting progress in image generation, current models often fall short in producing satisfactory poster results. The primary issue lies in the absence of specialized poster datasets for targeted model training. In this work, we propose a Movie Posters DataSet (MPDS), tailored for text-to-image generation models to revolutionize poster production. As dedicated to posters, MPDS stands out as the first image-text pair dataset to our knowledge, composing of 373k+ image-text pairs and 8k+ actor images (covering 4k+ actors). Detailed poster descriptions, such as movie titles, genres, casts, and synopses, are meticulously organized and standardized based on public movie synopsis, also named movie-synopsis prompt. To bolster poster descriptions as well as reduce differences from movie synopsis, further, we leverage a large-scale vision-language model to automatically produce vision-perceptive prompts for each poster, then perform manual rectification and integration with movie-synopsis prompt. In addition, we introduce a prompt of poster captions to exhibit text elements in posters like actor names and movie titles. For movie poster generation, we develop a multi-condition diffusion framework that takes poster prompt, poster caption, and actor image (for personalization) as inputs, yielding excellent results through the learning of a diffusion model. Experiments demonstrate the valuable role of our proposed MPDS dataset in advancing personalized movie poster generation. MPDS is available at https://anonymous.4open.science/r/MPDS-373k-BD3B.
Related papers
- Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster [13.28948224096886]
Movie genre classification plays a pivotal role in film marketing, audience engagement, and recommendation systems.
Previous explorations into movie genre classification have been mostly examined in plot summaries, subtitles, trailers and movie scenes.
We present the framework that exploits movie posters from a visual and textual perspective to address the multilabel movie genre classification problem.
arXiv Detail & Related papers (2024-10-12T16:14:18Z) - MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [62.72540590546812]
MovieDreamer is a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering.
We present experiments across various movie genres, demonstrating that our approach achieves superior visual and narrative quality.
arXiv Detail & Related papers (2024-07-23T17:17:05Z) - GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models [7.5791485306093245]
We propose an automatic poster generation framework with text rendering capabilities leveraging LLMs.
This framework aims to create precise poster text within a detailed contextual background.
We introduce a high-resolution font dataset and a poster dataset with resolutions exceeding 1024 pixels.
arXiv Detail & Related papers (2024-07-02T13:17:49Z) - DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models [53.17454737232668]
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts.
These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions.
We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
arXiv Detail & Related papers (2023-12-21T12:11:00Z) - Planning and Rendering: Towards Product Poster Generation with Diffusion Models [21.45855580640437]
We propose a novel product poster generation framework based on diffusion models named P&R.
At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components.
At the rendering stage, we propose a RenderNet to generate the background for the product while considering the generated layout.
Our method outperforms the state-of-the-art product poster generation methods on PPG30k.
arXiv Detail & Related papers (2023-12-14T11:11:50Z) - Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification [0.35998666903987897]
We present a deep transformer network with a probabilistic module to identify the movie genres exclusively from the poster.
For experiments, we procured 13882 posters of 13 genres from the Internet Movie Database (IMDb), where our model performances were encouraging.
arXiv Detail & Related papers (2023-09-21T12:39:36Z) - AutoPoster: A Highly Automatic and Content-aware Design System for
Advertising Poster Generation [14.20790443380675]
This paper introduces AutoPoster, a highly automatic and content-aware system for generating advertising posters.
With only product images and titles as inputs, AutoPoster can automatically produce posters of varying sizes through four key stages.
We propose the first poster generation dataset that includes visual attribute annotations for over 76k posters.
arXiv Detail & Related papers (2023-08-02T11:58:43Z) - MovieFactory: Automatic Movie Creation from Text using Large Generative
Models for Language and Images [92.13079696503803]
We present MovieFactory, a framework to generate cinematic-picture (3072$times$1280), film-style (multi-scene), and multi-modality (sounding) movies.
Our approach empowers users to create captivating movies with smooth transitions using simple text inputs.
arXiv Detail & Related papers (2023-06-12T17:31:23Z) - Film Trailer Generation via Task Decomposition [65.16768855902268]
We model movies as graphs, where nodes are shots and edges denote semantic relations between them.
We learn these relations using joint contrastive training which leverages privileged textual information from screenplays.
An unsupervised algorithm then traverses the graph and generates trailers that human judges prefer to ones generated by competitive supervised approaches.
arXiv Detail & Related papers (2021-11-16T20:50:52Z) - Political Posters Identification with Appearance-Text Fusion [49.55696202606098]
We propose a method that efficiently utilizes appearance features and text vectors to accurately classify political posters.
The majority of this work focuses on political posters that are designed to serve as a promotion of a certain political event.
arXiv Detail & Related papers (2020-12-19T16:14:51Z) - Movie Summarization via Sparse Graph Construction [65.16768855902268]
We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information.
According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms.
arXiv Detail & Related papers (2020-12-14T13:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.