Towards Automated Recipe Genre Classification using Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2310.15693v1
- Date: Tue, 24 Oct 2023 10:03:27 GMT
- Title: Towards Automated Recipe Genre Classification using Semi-Supervised
Learning
- Authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan
and Hasan Mahmud
- Abstract summary: We present a dataset named the Assorted, Archetypal, and Annotated Two Million Extended (3A2M+ Cooking Recipe dataset"
This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions.
We have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%.
- Score: 4.177122099296939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sharing cooking recipes is a great way to exchange culinary ideas and provide
instructions for food preparation. However, categorizing raw recipes found
online into appropriate food genres can be challenging due to a lack of
adequate labeled data. In this study, we present a dataset named the
``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking
Recipe Dataset" that contains two million culinary recipes labeled in
respective categories with extended named entities extracted from recipe
descriptions. This collection of data includes various features such as title,
NER, directions, and extended NER, as well as nine different labels
representing genres including bakery, drinks, non-veg, vegetables, fast food,
cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends
the size of the Named Entity Recognition (NER) list to address missing named
entities like heat, time or process from the recipe directions using two NER
extraction tools. 3A2M+ dataset provides a comprehensive solution to the
various challenging recipe-related tasks, including classification, named
entity recognition, and recipe generation. Furthermore, we have demonstrated
traditional machine learning, deep learning and pre-trained language models to
classify the recipes into their corresponding genre and achieved an overall
accuracy of 98.6\%. Our investigation indicates that the title feature played a
more significant role in classifying the genre.
Related papers
- Enhancing Personalized Recipe Recommendation Through Multi-Class Classification [0.0]
The problem domain involves recipe recommendations, utilizing techniques such as association analysis and classification.
The paper seeks not only to recommend recipes but also to explore the process involved in achieving accurate and personalized recommendations.
arXiv Detail & Related papers (2024-09-16T13:21:09Z) - Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes
Dataset based on Active Learning [2.40907745415345]
We present a novel dataset of two million culinary recipes labeled in respective categories.
To construct the dataset, we collect the recipes from the RecipeNLG dataset.
There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it.
arXiv Detail & Related papers (2023-03-27T07:53:18Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Cross-lingual Adaptation for Recipe Retrieval with Mixup [56.79360103639741]
Cross-modal recipe retrieval has attracted research attention in recent years, thanks to the availability of large-scale paired data for training.
This paper studies unsupervised domain adaptation for image-to-recipe retrieval, where recipes in source and target domains are in different languages.
A novel recipe mixup method is proposed to learn transferable embedding features between the two domains.
arXiv Detail & Related papers (2022-05-08T15:04:39Z) - A Rich Recipe Representation as Plan to Support Expressive Multi Modal
Queries on Recipe Content and Preparation Process [24.94173789568803]
We discuss the construction of a machine-understandable rich recipe representation (R3)
R3 is infused with additional knowledge such as information about allergens and images of ingredients.
We also present TREAT, a tool for recipe retrieval which uses R3 to perform multi-modal reasoning on the recipe's content.
arXiv Detail & Related papers (2022-03-31T15:29:38Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks [48.39191088844315]
In the cooking domain, the web offers many partially-overlapping text and video recipes that describe how to make the same dish.
We use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish.
We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish.
arXiv Detail & Related papers (2020-05-19T17:27:00Z) - Classification of Cuisines from Sequentially Structured Recipes [8.696042114987966]
classification of cuisines based on their culinary features is an outstanding problem.
We have implemented a range of classification techniques by accounting for this information on the RecipeDB dataset.
The state-of-the-art RoBERTa model presented the highest accuracy of 73.30% among a range of classification models.
arXiv Detail & Related papers (2020-04-26T05:40:36Z) - A Named Entity Based Approach to Model Recipes [9.18959130745234]
We propose a structure that can accurately represent the recipe as well as a pipeline to infer the best representation of the recipe in this uniform structure.
Ingredients section in a recipe typically lists down the ingredients required and corresponding attributes such as quantity, temperature, and processing state.
The instruction section lists down a series of events in which a cooking technique or process is applied upon these utensils and ingredients.
arXiv Detail & Related papers (2020-04-25T16:37:26Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.