CookGAN: Meal Image Synthesis from Ingredients
- URL: http://arxiv.org/abs/2002.11493v1
- Date: Tue, 25 Feb 2020 00:54:10 GMT
- Title: CookGAN: Meal Image Synthesis from Ingredients
- Authors: Fangda Han, Ricardo Guerrero, Vladimir Pavlovic
- Abstract summary: We propose a new computational framework, based on generative deep models, for synthesis of photo-realistic food meal images from textual list of its ingredients.
CookGAN builds an attention-based ingredients-image association model, which is then used to condition a generative neural network tasked with synthesizing meal images.
- Score: 24.295634252929112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we propose a new computational framework, based on generative
deep models, for synthesis of photo-realistic food meal images from textual
list of its ingredients. Previous works on synthesis of images from text
typically rely on pre-trained text models to extract text features, followed by
generative neural networks (GAN) aimed to generate realistic images conditioned
on the text features. These works mainly focus on generating spatially compact
and well-defined categories of objects, such as birds or flowers, but meal
images are significantly more complex, consisting of multiple ingredients whose
appearance and spatial qualities are further modified by cooking methods. To
generate real-like meal images from ingredients, we propose Cook Generative
Adversarial Networks (CookGAN), CookGAN first builds an attention-based
ingredients-image association model, which is then used to condition a
generative neural network tasked with synthesizing meal images. Furthermore, a
cycle-consistent constraint is added to further improve image quality and
control appearance. Experiments show our model is able to generate meal images
corresponding to the ingredients.
Related papers
- CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion [58.92430755180394]
We present textbfCookingDiffusion, a novel approach to generate photo-realistic images of cooking steps.
These prompts encompass text prompts, image prompts, and multi-modal prompts, ensuring the consistent generation of cooking procedural images.
Our experimental results demonstrate that our model excels at generating high-quality cooking procedural images.
arXiv Detail & Related papers (2025-01-15T06:58:53Z) - Deep Image-to-Recipe Translation [0.0]
Deep Image-to-Recipe Translation aims to bridge the gap between cherished food memories and the art of culinary creation.
Our primary objective involves predicting ingredients from a given food image.
Our approach emphasizes the importance of metrics such as Intersection over Union (IoU) and F1 score in scenarios where accuracy alone might be misleading.
arXiv Detail & Related papers (2024-07-01T02:33:07Z) - FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation [69.91401809979709]
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images.
We introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.
The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs.
arXiv Detail & Related papers (2023-12-06T15:07:12Z) - Learning Structural Representations for Recipe Generation and Food
Retrieval [101.97397967958722]
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
arXiv Detail & Related papers (2021-10-04T06:36:31Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.