Related papers: Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning

Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning

URL: http://arxiv.org/abs/2303.16778v1
Date: Mon, 27 Mar 2023 07:53:18 GMT
Title: Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning
Authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan and Hasan Mahmud
Abstract summary: We present a novel dataset of two million culinary recipes labeled in respective categories. To construct the dataset, we collect the recipes from the RecipeNLG dataset. There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it.
Score: 2.40907745415345
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Cooking recipes allow individuals to exchange culinary ideas and provide food preparation instructions. Due to a lack of adequate labeled data, categorizing raw recipes found online to the appropriate food genres is a challenging task in this domain. Utilizing the knowledge of domain experts to categorize recipes could be a solution. In this study, we present a novel dataset of two million culinary recipes labeled in respective categories leveraging the knowledge of food experts and an active learning technique. To construct the dataset, we collect the recipes from the RecipeNLG dataset. Then, we employ three human experts whose trustworthiness score is higher than 86.667% to categorize 300K recipe by their Named Entity Recognition (NER) and assign it to one of the nine categories: bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides and fusion. Finally, we categorize the remaining 1900K recipes using Active Learning method with a blend of Query-by-Committee and Human In The Loop (HITL) approaches. There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it. For the 9 genres, the Fleiss Kappa score of this massive dataset is roughly 0.56026. We believe that the research community can use this dataset to perform various machine learning tasks such as recipe genre classification, recipe generation of a specific genre, new recipe creation, etc. The dataset can also be used to train and evaluate the performance of various NLP tasks such as named entity recognition, part-of-speech tagging, semantic role labeling, and so on. The dataset will be available upon publication: https://tinyurl.com/3zu4778y.

Related papers

Towards Automated Recipe Genre Classification using Semi-Supervised Learning [4.177122099296939]
We present a dataset named the Assorted, Archetypal, and Annotated Two Million Extended (3A2M+ Cooking Recipe dataset" This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. We have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%.
arXiv Detail & Related papers (2023-10-24T10:03:27Z)
Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient. We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge. Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z)
Cross-lingual Adaptation for Recipe Retrieval with Mixup [56.79360103639741]
Cross-modal recipe retrieval has attracted research attention in recent years, thanks to the availability of large-scale paired data for training. This paper studies unsupervised domain adaptation for image-to-recipe retrieval, where recipes in source and target domains are in different languages. A novel recipe mixup method is proposed to learn transferable embedding features between the two domains.
arXiv Detail & Related papers (2022-05-08T15:04:39Z)
TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark [1.0569625612398386]
NER models are expected to find or infer various types of entities helpful in processing recipes. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models.
arXiv Detail & Related papers (2022-04-16T10:52:21Z)
Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning [0.0]
Knowing which ingredients can be mixed to make a delicious food recipe is essential. We implemented a model for food ingredients recognition and designed an algorithm for recommending recipes based on recognised ingredients. We achieved an accuracy of 94 percent, which is quite impressive.
arXiv Detail & Related papers (2022-03-13T17:42:38Z)
A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images. We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z)
Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients. Target recipes are long-length paragraphs and do not have annotations on structure information. We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z)
Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction. We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z)
Classification of Cuisines from Sequentially Structured Recipes [8.696042114987966]
classification of cuisines based on their culinary features is an outstanding problem. We have implemented a range of classification techniques by accounting for this information on the RecipeDB dataset. The state-of-the-art RoBERTa model presented the highest accuracy of 73.30% among a range of classification models.
arXiv Detail & Related papers (2020-04-26T05:40:36Z)
A Named Entity Based Approach to Model Recipes [9.18959130745234]
We propose a structure that can accurately represent the recipe as well as a pipeline to infer the best representation of the recipe in this uniform structure. Ingredients section in a recipe typically lists down the ingredients required and corresponding attributes such as quantity, temperature, and processing state. The instruction section lists down a series of events in which a cooking technique or process is applied upon these utensils and ingredients.
arXiv Detail & Related papers (2020-04-25T16:37:26Z)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.