Diffusion Model with Clustering-based Conditioning for Food Image
Generation
- URL: http://arxiv.org/abs/2309.00199v1
- Date: Fri, 1 Sep 2023 01:40:39 GMT
- Title: Diffusion Model with Clustering-based Conditioning for Food Image
Generation
- Authors: Yue Han, Jiangpeng He, Mridul Gupta, Edward J. Delp, Fengqing Zhu
- Abstract summary: Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation.
One potential solution is to use synthetic food images for data augmentation.
In this paper, we propose an effective clustering-based training framework, named ClusDiff, for generating high-quality and representative food images.
- Score: 22.154182296023404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based dietary assessment serves as an efficient and accurate solution
for recording and analyzing nutrition intake using eating occasion images as
input. Deep learning-based techniques are commonly used to perform image
analysis such as food classification, segmentation, and portion size
estimation, which rely on large amounts of food images with annotations for
training. However, such data dependency poses significant barriers to
real-world applications, because acquiring a substantial, diverse, and balanced
set of food images can be challenging. One potential solution is to use
synthetic food images for data augmentation. Although existing work has
explored the use of generative adversarial networks (GAN) based structures for
generation, the quality of synthetic food images still remains subpar. In
addition, while diffusion-based generative models have shown promising results
for general image generation tasks, the generation of food images can be
challenging due to the substantial intra-class variance. In this paper, we
investigate the generation of synthetic food images based on the conditional
diffusion model and propose an effective clustering-based training framework,
named ClusDiff, for generating high-quality and representative food images. The
proposed method is evaluated on the Food-101 dataset and shows improved
performance when compared with existing image generation works. We also
demonstrate that the synthetic food images generated by ClusDiff can help
address the severe class imbalance issue in long-tailed food classification
using the VFN-LT dataset.
Related papers
- Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models [48.821150379374714]
We introduce a large-scale, high-quality food image composite dataset, FC22k, which comprises 22,000 foreground, background, and ground truth ternary image pairs.
We propose a novel food image composition method, Foodfusion, which incorporates a Fusion Module for processing and integrating foreground and background information.
arXiv Detail & Related papers (2024-08-26T09:32:16Z) - Shape-Preserving Generation of Food Images for Automatic Dietary Assessment [1.602820210496921]
We present a simple GAN-based neural network architecture for conditional food image generation.
The shapes of the food and container in the generated images closely resemble those in the reference input image.
arXiv Detail & Related papers (2024-08-23T20:18:51Z) - FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation [69.91401809979709]
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images.
We introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.
The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs.
arXiv Detail & Related papers (2023-12-06T15:07:12Z) - NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating.
Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images.
We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information.
We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Conditional Synthetic Food Image Generation [12.235703733345833]
Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability.
We aim to explore the capability and improve the performance of GAN methods for food image generation.
arXiv Detail & Related papers (2023-03-16T00:23:20Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.