Predefined domain specific embeddings of food concepts and recipes: A
case study on heterogeneous recipe datasets
- URL: http://arxiv.org/abs/2302.01005v1
- Date: Thu, 2 Feb 2023 10:49:06 GMT
- Title: Predefined domain specific embeddings of food concepts and recipes: A
case study on heterogeneous recipe datasets
- Authors: Gordana Ispirova, Tome Eftimov, and Barbara Korou\v{s}i\'c Seljak
- Abstract summary: Recipe datasets are usually collected from social media websites where users post and publish recipes.
We collect six different recipe datasets, publicly available, in different formats, and some including data in different languages.
Bringing all of these datasets to the needed format for applying a machine learning (ML) pipeline for nutrient prediction is presented.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although recipe data are very easy to come by nowadays, it is really hard to
find a complete recipe dataset - with a list of ingredients, nutrient values
per ingredient, and per recipe, allergens, etc. Recipe datasets are usually
collected from social media websites where users post and publish recipes.
Usually written with little to no structure, using both standardized and
non-standardized units of measurement. We collect six different recipe
datasets, publicly available, in different formats, and some including data in
different languages. Bringing all of these datasets to the needed format for
applying a machine learning (ML) pipeline for nutrient prediction [1], [2],
includes data normalization using dictionary-based named entity recognition
(NER), rule-based NER, as well as conversions using external domain-specific
resources. From the list of ingredients, domain-specific embeddings are created
using the same embedding space for all recipes - one ingredient dataset is
generated. The result from this normalization process is two corpora - one with
predefined ingredient embeddings and one with predefined recipe embeddings. On
all six recipe datasets, the ML pipeline is evaluated. The results from this
use case also confirm that the embeddings merged using the domain heuristic
yield better results than the baselines.
Related papers
- Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.
It retrieves recipes semantically related to the image from an existing datastore as a supplement.
It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z) - Deep Learning Based Named Entity Recognition Models for Recipes [7.507956305171027]
Named entity recognition (NER) is a technique for extracting information from unstructured or semi-structured data with known labels.
We created an augmented dataset of 26,445 phrases cumulatively.
We analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER.
A thorough investigation of NER approaches on these datasets involving statistical, fine-tuning of deep learning-based language models provides deep insights.
arXiv Detail & Related papers (2024-02-27T12:03:56Z) - Towards Automated Recipe Genre Classification using Semi-Supervised
Learning [4.177122099296939]
We present a dataset named the Assorted, Archetypal, and Annotated Two Million Extended (3A2M+ Cooking Recipe dataset"
This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions.
We have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%.
arXiv Detail & Related papers (2023-10-24T10:03:27Z) - Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes
Dataset based on Active Learning [2.40907745415345]
We present a novel dataset of two million culinary recipes labeled in respective categories.
To construct the dataset, we collect the recipes from the RecipeNLG dataset.
There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it.
arXiv Detail & Related papers (2023-03-27T07:53:18Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Cross-lingual Adaptation for Recipe Retrieval with Mixup [56.79360103639741]
Cross-modal recipe retrieval has attracted research attention in recent years, thanks to the availability of large-scale paired data for training.
This paper studies unsupervised domain adaptation for image-to-recipe retrieval, where recipes in source and target domains are in different languages.
A novel recipe mixup method is proposed to learn transferable embedding features between the two domains.
arXiv Detail & Related papers (2022-05-08T15:04:39Z) - Assistive Recipe Editing through Critiquing [34.1050269670062]
RecipeCrit is a hierarchical denoising auto-encoder that edits recipes given ingredient-level critiques.
Our work's main innovation is our unsupervised critiquing module that allows users to edit recipes by interacting with the predicted ingredients.
arXiv Detail & Related papers (2022-05-05T05:52:27Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks [48.39191088844315]
In the cooking domain, the web offers many partially-overlapping text and video recipes that describe how to make the same dish.
We use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish.
We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish.
arXiv Detail & Related papers (2020-05-19T17:27:00Z) - A Named Entity Based Approach to Model Recipes [9.18959130745234]
We propose a structure that can accurately represent the recipe as well as a pipeline to infer the best representation of the recipe in this uniform structure.
Ingredients section in a recipe typically lists down the ingredients required and corresponding attributes such as quantity, temperature, and processing state.
The instruction section lists down a series of events in which a cooking technique or process is applied upon these utensils and ingredients.
arXiv Detail & Related papers (2020-04-25T16:37:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.