Towards End-User Development for IoT: A Case Study on Semantic Parsing
of Cooking Recipes for Programming Kitchen Devices
- URL: http://arxiv.org/abs/2309.14165v1
- Date: Mon, 25 Sep 2023 14:21:24 GMT
- Title: Towards End-User Development for IoT: A Case Study on Semantic Parsing
of Cooking Recipes for Programming Kitchen Devices
- Authors: Filippos Ventirozos, Sarah Clinch and Riza Batista-Navarro
- Abstract summary: We provide a unique corpus which aims to support the transformation of cooking recipe instructions to machine-understandable commands for IoT devices in the kitchen.
Based on this corpus, we developed machine learning-based sequence labelling methods, namely conditional random fields (CRF) and a neural network model.
Our results show that while it is feasible to train semantics based on our annotations, most natural-language instructions are incomplete, and thus transforming them into formal meaning representation is not straightforward.
- Score: 4.863892359772122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic parsing of user-generated instructional text, in the way of enabling
end-users to program the Internet of Things (IoT), is an underexplored area. In
this study, we provide a unique annotated corpus which aims to support the
transformation of cooking recipe instructions to machine-understandable
commands for IoT devices in the kitchen. Each of these commands is a tuple
capturing the semantics of an instruction involving a kitchen device in terms
of "What", "Where", "Why" and "How". Based on this corpus, we developed machine
learning-based sequence labelling methods, namely conditional random fields
(CRF) and a neural network model, in order to parse recipe instructions and
extract our tuples of interest from them. Our results show that while it is
feasible to train semantic parsers based on our annotations, most
natural-language instructions are incomplete, and thus transforming them into
formal meaning representation, is not straightforward.
Related papers
- Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation [8.931633531104021]
SAS (Spatially-Aware Speaker) is an instruction generator that uses both structural and semantic knowledge of the environment to produce richer instructions.
Our method outperforms existing instruction generation models, evaluated using standard metrics.
arXiv Detail & Related papers (2024-09-09T13:12:11Z) - Answer is All You Need: Instruction-following Text Embedding via
Answering the Question [41.727700155498546]
This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly.
Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks.
arXiv Detail & Related papers (2024-02-15T01:02:41Z) - PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes [7.839338724237275]
A model to effectively reason about cooking recipes must accurately discern and understand the inputs and outputs of intermediate steps within the recipe.
We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step.
arXiv Detail & Related papers (2024-01-12T23:33:01Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - The Proof is in the Pudding: Using Automated Theorem Proving to Generate
Cooking Recipes [4.281959480566437]
This paper presents FASTFOOD, a rule-based Natural Language Generation Program for cooking recipes.
Recipes are generated by using an Automated Theorem Proving procedure to select the ingredients and instructions, with ingredients corresponding to axioms and instructions to implications.
FASTFOOD also contains a temporal optimization module which can rearrange the recipe to make it more time-efficient for the user.
arXiv Detail & Related papers (2022-03-05T08:50:34Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks [48.39191088844315]
In the cooking domain, the web offers many partially-overlapping text and video recipes that describe how to make the same dish.
We use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish.
We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish.
arXiv Detail & Related papers (2020-05-19T17:27:00Z) - A Benchmark for Structured Procedural Knowledge Extraction from Cooking
Videos [126.66212285239624]
We propose a benchmark of structured procedural knowledge extracted from cooking videos.
Our manually annotated open-vocabulary resource includes 356 instructional cooking videos and 15,523 video clip/sentence-level annotations.
arXiv Detail & Related papers (2020-05-02T05:15:20Z) - ESPnet-ST: All-in-One Speech Translation Toolkit [57.76342114226599]
ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet.
It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation.
We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
arXiv Detail & Related papers (2020-04-21T18:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.