50 Ways to Bake a Cookie: Mapping the Landscape of Procedural Texts
- URL: http://arxiv.org/abs/2210.17235v1
- Date: Mon, 31 Oct 2022 11:41:54 GMT
- Title: 50 Ways to Bake a Cookie: Mapping the Landscape of Procedural Texts
- Authors: Moran Mizrahi, Dafna Shahaf
- Abstract summary: We propose an unsupervised learning approach for summarizing multiple procedural texts into an intuitive graph representation.
We demonstrate our approach on recipes, a prominent example of procedural texts.
- Score: 15.185745028886648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The web is full of guidance on a wide variety of tasks, from changing the oil
in your car to baking an apple pie. However, as content is created
independently, a single task could have thousands of corresponding procedural
texts. This makes it difficult for users to view the bigger picture and
understand the multiple ways the task could be accomplished. In this work we
propose an unsupervised learning approach for summarizing multiple procedural
texts into an intuitive graph representation, allowing users to easily explore
commonalities and differences. We demonstrate our approach on recipes, a
prominent example of procedural texts. User studies show that our
representation is intuitive and coherent and that it has the potential to help
users with several sensemaking tasks, including adapting recipes for a novice
cook and finding creative ways to spice up a dish.
Related papers
- Large Language Models as Sous Chefs: Revising Recipes with GPT-3 [56.7155146252028]
We focus on recipes as an example of complex, diverse, and widely used instructions.
We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps.
We also contribute an Amazon Mechanical Turk task that is carefully designed to reduce fatigue while collecting human judgment of the quality of recipe revisions.
arXiv Detail & Related papers (2023-06-24T14:42:43Z) - Learning Program Representations for Food Images and Cooking Recipes [26.054436410924737]
We propose to represent cooking recipes and food images as cooking programs.
A model is trained to learn a joint embedding between recipes and food images via self-supervision.
We show that projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results.
arXiv Detail & Related papers (2022-03-30T05:52:41Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks [48.39191088844315]
In the cooking domain, the web offers many partially-overlapping text and video recipes that describe how to make the same dish.
We use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish.
We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish.
arXiv Detail & Related papers (2020-05-19T17:27:00Z) - A Benchmark for Structured Procedural Knowledge Extraction from Cooking
Videos [126.66212285239624]
We propose a benchmark of structured procedural knowledge extracted from cooking videos.
Our manually annotated open-vocabulary resource includes 356 instructional cooking videos and 15,523 video clip/sentence-level annotations.
arXiv Detail & Related papers (2020-05-02T05:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.