Related papers: Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

URL: http://arxiv.org/abs/2511.16965v1
Date: Fri, 21 Nov 2025 05:38:15 GMT
Title: Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices
Authors: Jigyasa Gupta, Soumya Goyal, Anil Kumar, Ishan Jindal,
Abstract summary: We introduce the first oven-based cooking-progression dataset with chef-annotated doneness levels.<n>We propose an edge-efficient recipe and cooking state guided generator that synthesizes realistic food images conditioned on raw food image.
Score: 4.373318192668093
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Synthesizing realistic cooked food images from raw inputs on edge devices is a challenging generative task, requiring models to capture complex changes in texture, color and structure during cooking. Existing image-to-image generation methods often produce unrealistic results or are too resource-intensive for edge deployment. We introduce the first oven-based cooking-progression dataset with chef-annotated doneness levels and propose an edge-efficient recipe and cooking state guided generator that synthesizes realistic food images conditioned on raw food image. This formulation enables user-preferred visual targets rather than fixed presets. To ensure temporal consistency and culinary plausibility, we introduce a domain-specific \textit{Culinary Image Similarity (CIS)} metric, which serves both as a training loss and a progress-monitoring signal. Our model outperforms existing baselines with significant reductions in FID scores (30\% improvement on our dataset; 60\% on public datasets)

Related papers

LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets [54.527878056610156]
We present a framework empowered with large language models (LLMs) to address these challenges in food recognition.<n>We first leverage LLMs to parse food images to generate food titles and ingredients.<n>Then, we project the generated texts and food images from different domains to a shared embedding space to maximize the pair similarities.
arXiv Detail & Related papers (2025-11-20T04:38:56Z)
CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion [58.92430755180394]
We present textbfCookingDiffusion, a novel approach to generate photo-realistic images of cooking steps.<n>These prompts encompass text prompts, image prompts, and multi-modal prompts, ensuring the consistent generation of cooking procedural images.<n>Our experimental results demonstrate that our model excels at generating high-quality cooking procedural images.
arXiv Detail & Related papers (2025-01-15T06:58:53Z)
Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.<n>It retrieves recipes semantically related to the image from an existing datastore as a supplement.<n>It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z)
Personalized Food Image Classification: Benchmark Datasets and New Baseline [8.019925729254178]
We propose a new framework for personalized food image classification by leveraging self-supervised learning and temporal image feature information. Our method is evaluated on both benchmark datasets and shows improved performance compared to existing works.
arXiv Detail & Related papers (2023-09-15T20:11:07Z)
Diffusion Model with Clustering-based Conditioning for Food Image Generation [22.154182296023404]
Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation. One potential solution is to use synthetic food images for data augmentation. In this paper, we propose an effective clustering-based training framework, named ClusDiff, for generating high-quality and representative food images.
arXiv Detail & Related papers (2023-09-01T01:40:39Z)
FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain. We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z)
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food. One challenge is that food items can overlap and mix, making them difficult to distinguish. Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT) The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z)
A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images. We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z)
Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives. We propose a simplified end-to-end model based on well established and high performing encoders for text and images. Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z)
Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction. We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z)
CookGAN: Meal Image Synthesis from Ingredients [24.295634252929112]
We propose a new computational framework, based on generative deep models, for synthesis of photo-realistic food meal images from textual list of its ingredients. CookGAN builds an attention-based ingredients-image association model, which is then used to condition a generative neural network tasked with synthesizing meal images.
arXiv Detail & Related papers (2020-02-25T00:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.