A Rich Recipe Representation as Plan to Support Expressive Multi Modal
Queries on Recipe Content and Preparation Process
- URL: http://arxiv.org/abs/2203.17109v1
- Date: Thu, 31 Mar 2022 15:29:38 GMT
- Title: A Rich Recipe Representation as Plan to Support Expressive Multi Modal
Queries on Recipe Content and Preparation Process
- Authors: Vishal Pallagani, Priyadharsini Ramamurthy, Vedant Khandelwal, Revathy
Venkataramanan, Kausik Lakkaraju, Sathyanarayanan N. Aakur, Biplav Srivastava
- Abstract summary: We discuss the construction of a machine-understandable rich recipe representation (R3)
R3 is infused with additional knowledge such as information about allergens and images of ingredients.
We also present TREAT, a tool for recipe retrieval which uses R3 to perform multi-modal reasoning on the recipe's content.
- Score: 24.94173789568803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Food is not only a basic human necessity but also a key factor driving a
society's health and economic well-being. As a result, the cooking domain is a
popular use-case to demonstrate decision-support (AI) capabilities in service
of benefits like precision health with tools ranging from information retrieval
interfaces to task-oriented chatbots. An AI here should understand concepts in
the food domain (e.g., recipes, ingredients), be tolerant to failures
encountered while cooking (e.g., browning of butter), handle allergy-based
substitutions, and work with multiple data modalities (e.g. text and images).
However, the recipes today are handled as textual documents which makes it
difficult for machines to read, reason and handle ambiguity. This demands a
need for better representation of the recipes, overcoming the ambiguity and
sparseness that exists in the current textual documents. In this paper, we
discuss the construction of a machine-understandable rich recipe representation
(R3), in the form of plans, from the recipes available in natural language. R3
is infused with additional knowledge such as information about allergens and
images of ingredients, possible failures and tips for each atomic cooking step.
To show the benefits of R3, we also present TREAT, a tool for recipe retrieval
which uses R3 to perform multi-modal reasoning on the recipe's content (plan
objects - ingredients and cooking tools), food preparation process (plan
actions and time), and media type (image, text). R3 leads to improved retrieval
efficiency and new capabilities that were hither-to not possible in textual
representation.
Related papers
- Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.
It retrieves recipes semantically related to the image from an existing datastore as a supplement.
It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z) - PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes [7.839338724237275]
A model to effectively reason about cooking recipes must accurately discern and understand the inputs and outputs of intermediate steps within the recipe.
We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step.
arXiv Detail & Related papers (2024-01-12T23:33:01Z) - FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image.
This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain.
We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z) - Large Language Models as Sous Chefs: Revising Recipes with GPT-3 [56.7155146252028]
We focus on recipes as an example of complex, diverse, and widely used instructions.
We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps.
We also contribute an Amazon Mechanical Turk task that is carefully designed to reduce fatigue while collecting human judgment of the quality of recipe revisions.
arXiv Detail & Related papers (2023-06-24T14:42:43Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Attention-based Ingredient Phrase Parser [3.499870393443268]
We propose a new ingredient parsing model that can parse an ingredient phrase of recipes into the structure form with its corresponding attributes with over 0.93 F1-score.
Experimental results show that our model achieves state-of-the-art performance on AllRecipes and Food.com datasets.
arXiv Detail & Related papers (2022-10-05T20:09:35Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.