Towards an Action-Centric Ontology for Cooking Procedures Using Temporal Graphs
- URL: http://arxiv.org/abs/2509.04159v1
- Date: Thu, 04 Sep 2025 12:34:56 GMT
- Title: Towards an Action-Centric Ontology for Cooking Procedures Using Temporal Graphs
- Authors: Aarush Kumbhakern, Saransh Kumar Gupta, Lipika Dey, Partha Pratim Das,
- Abstract summary: We introduce a domain-specific language for representing recipes as directed action processes, transfers, environments, capturing, and compositional structure.<n>This work represents initial steps towards an action-centric for cooking, using temporal graphs to enable structured machine understanding, precise interpretation, and scalable automation of culinary processes.
- Score: 2.504740578240899
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Formalizing cooking procedures remains a challenging task due to their inherent complexity and ambiguity. We introduce an extensible domain-specific language for representing recipes as directed action graphs, capturing processes, transfers, environments, concurrency, and compositional structure. Our approach enables precise, modular modeling of complex culinary workflows. Initial manual evaluation on a full English breakfast recipe demonstrates the DSL's expressiveness and suitability for future automated recipe analysis and execution. This work represents initial steps towards an action-centric ontology for cooking, using temporal graphs to enable structured machine understanding, precise interpretation, and scalable automation of culinary processes - both in home kitchens and professional culinary settings.
Related papers
- Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance [6.4337734580551365]
We present a cooking process visualization model, called Chain-of-Cooking.<n>To generate correct appearances of ingredients, we retrieve previously generated image patches as references.<n>To enhance the coherence and keep the rational order of generated images, we propose a Semantic Evolution Module and a Bidirectional Chain-of-Thought (CoT) Guidance.
arXiv Detail & Related papers (2025-07-29T06:34:59Z) - VisualChef: Generating Visual Aids in Cooking via Mask Inpainting [50.84305074983752]
We introduce VisualChef, a method for generating contextual visual aids tailored to cooking scenarios.<n>Given an initial frame and a specified action, VisualChef generates images depicting both the action's execution and the resulting appearance of the object.<n>We evaluate VisualChef quantitatively and qualitatively on three egocentric video datasets and show its improvements over state-of-the-art methods.
arXiv Detail & Related papers (2025-06-23T12:23:21Z) - CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion [58.92430755180394]
We present textbfCookingDiffusion, a novel approach to generate photo-realistic images of cooking steps.<n>These prompts encompass text prompts, image prompts, and multi-modal prompts, ensuring the consistent generation of cooking procedural images.<n>Our experimental results demonstrate that our model excels at generating high-quality cooking procedural images.
arXiv Detail & Related papers (2025-01-15T06:58:53Z) - The Proof is in the Almond Cookies [7.534061469399505]
This paper presents a case study on how to process cooking recipes (and more generally, how-to instructions) in a way that makes it possible for a robot or artificial cooking assistant to support human chefs in the kitchen.<n>We propose a novel approach to computational recipe understanding that mimics the human sense-making process, which is narrative-based.
arXiv Detail & Related papers (2025-01-03T14:25:35Z) - PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes [7.839338724237275]
A model to effectively reason about cooking recipes must accurately discern and understand the inputs and outputs of intermediate steps within the recipe.
We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step.
arXiv Detail & Related papers (2024-01-12T23:33:01Z) - Structured Vision-Language Pretraining for Computational Cooking [54.0571416522547]
Vision-Language Pretraining and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks.
We propose to leverage these techniques for structured-text based computational cuisine tasks.
arXiv Detail & Related papers (2022-12-08T13:37:17Z) - Classifying States of Cooking Objects Using Convolutional Neural Network [6.127963013089406]
The main aim is to make the cooking process easier, safer, and create human welfare.
It is important for robots to understand the cooking environment and recognize the objects, especially correctly identifying the state of the cooking objects.
In this project, several parts of the experiment were conducted to design a robust deep convolutional neural network for classifying the state of the cooking objects from scratch.
arXiv Detail & Related papers (2021-04-30T22:26:40Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - A Benchmark for Structured Procedural Knowledge Extraction from Cooking
Videos [126.66212285239624]
We propose a benchmark of structured procedural knowledge extracted from cooking videos.
Our manually annotated open-vocabulary resource includes 356 instructional cooking videos and 15,523 video clip/sentence-level annotations.
arXiv Detail & Related papers (2020-05-02T05:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.