FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
- URL: http://arxiv.org/abs/2312.03540v1
- Date: Wed, 6 Dec 2023 15:07:12 GMT
- Title: FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
- Authors: Olivia Markham and Yuhao Chen and Chi-en Amy Tai and Alexander Wong
- Abstract summary: Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images.
We introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.
The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs.
- Score: 69.91401809979709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art image generation models such as Latent Diffusion
Models (LDMs) have demonstrated the capacity to produce visually striking
food-related images. However, these generated images often exhibit an artistic
or surreal quality that diverges from the authenticity of real-world food
representations. This inadequacy renders them impractical for applications
requiring realistic food imagery, such as training models for image-based
dietary assessment. To address these limitations, we introduce FoodFusion, a
Latent Diffusion model engineered specifically for the faithful synthesis of
realistic food images from textual descriptions. The development of the
FoodFusion model involves harnessing an extensive array of open-source food
datasets, resulting in over 300,000 curated image-caption pairs. Additionally,
we propose and employ two distinct data cleaning methodologies to ensure that
the resulting image-text pairs maintain both realism and accuracy. The
FoodFusion model, thus trained, demonstrates a remarkable ability to generate
food images that exhibit a significant improvement in terms of both realism and
diversity over the publicly available image generation models. We openly share
the dataset and fine-tuned models to support advancements in this critical
field of food image synthesis at https://bit.ly/genai4good.
Related papers
- ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation [19.704975821172315]
We introduce a novel food computing foundation model that achieves true multimodality.
By leveraging large language models (LLMs) and pre-trained image encoder and decoder models, our model can perform a diverse array of food computing-related tasks.
arXiv Detail & Related papers (2024-09-18T14:24:29Z) - Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models [48.821150379374714]
We introduce a large-scale, high-quality food image composite dataset, FC22k, which comprises 22,000 foreground, background, and ground truth ternary image pairs.
We propose a novel food image composition method, Foodfusion, which incorporates a Fusion Module for processing and integrating foreground and background information.
arXiv Detail & Related papers (2024-08-26T09:32:16Z) - Shape-Preserving Generation of Food Images for Automatic Dietary Assessment [1.602820210496921]
We present a simple GAN-based neural network architecture for conditional food image generation.
The shapes of the food and container in the generated images closely resemble those in the reference input image.
arXiv Detail & Related papers (2024-08-23T20:18:51Z) - NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating.
Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images.
We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information.
We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z) - Diffusion Model with Clustering-based Conditioning for Food Image
Generation [22.154182296023404]
Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation.
One potential solution is to use synthetic food images for data augmentation.
In this paper, we propose an effective clustering-based training framework, named ClusDiff, for generating high-quality and representative food images.
arXiv Detail & Related papers (2023-09-01T01:40:39Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - NutritionVerse-Thin: An Optimized Strategy for Enabling Improved
Rendering of 3D Thin Food Models [66.77685168785152]
We present an optimized strategy for enabling improved rendering of thin 3D food models.
Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method.
While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
arXiv Detail & Related papers (2023-04-12T05:34:32Z) - Conditional Synthetic Food Image Generation [12.235703733345833]
Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability.
We aim to explore the capability and improve the performance of GAN methods for food image generation.
arXiv Detail & Related papers (2023-03-16T00:23:20Z) - Compositional Visual Generation with Composable Diffusion Models [80.75258849913574]
We propose an alternative structured approach for compositional generation using diffusion models.
An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image.
The proposed method can generate scenes at test time that are substantially more complex than those seen in training.
arXiv Detail & Related papers (2022-06-03T17:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.