Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural
Networks
- URL: http://arxiv.org/abs/2205.12396v1
- Date: Tue, 24 May 2022 23:04:02 GMT
- Title: Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural
Networks
- Authors: Yijun Tian, Chuxu Zhang, Zhichun Guo, Yihong Ma, Ronald Metoyer,
Nitesh V. Chawla
- Abstract summary: We formalize the problem of multi-modal recipe representation learning to integrate the visual, textual, and relational information into recipe embeddings.
We first present Large-RG, a new recipe graph data with over half a million nodes, making it the largest recipe graph to date.
We then propose Recipe2Vec, a novel graph neural network based recipe embedding model to capture multi-modal information.
- Score: 23.378813327724686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning effective recipe representations is essential in food studies.
Unlike what has been developed for image-based recipe retrieval or learning
structural text embeddings, the combined effect of multi-modal information
(i.e., recipe images, text, and relation data) receives less attention. In this
paper, we formalize the problem of multi-modal recipe representation learning
to integrate the visual, textual, and relational information into recipe
embeddings. In particular, we first present Large-RG, a new recipe graph data
with over half a million nodes, making it the largest recipe graph to date. We
then propose Recipe2Vec, a novel graph neural network based recipe embedding
model to capture multi-modal information. Additionally, we introduce an
adversarial attack strategy to ensure stable learning and improve performance.
Finally, we design a joint objective function of node classification and
adversarial learning to optimize the model. Extensive experiments demonstrate
that Recipe2Vec outperforms state-of-the-art baselines on two classic food
study tasks, i.e., cuisine category classification and region prediction.
Dataset and codes are available at https://github.com/meettyj/Recipe2Vec.
Related papers
- Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.
It retrieves recipes semantically related to the image from an existing datastore as a supplement.
It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z) - Deep Image-to-Recipe Translation [0.0]
Deep Image-to-Recipe Translation aims to bridge the gap between cherished food memories and the art of culinary creation.
Our primary objective involves predicting ingredients from a given food image.
Our approach emphasizes the importance of metrics such as Intersection over Union (IoU) and F1 score in scenarios where accuracy alone might be misleading.
arXiv Detail & Related papers (2024-07-01T02:33:07Z) - RecipeRec: A Heterogeneous Graph Learning Model for Recipe
Recommendation [26.84274830886026]
We formalize the problem of recipe recommendation with graphs to incorporate the collaborative signal into recipe recommendation.
We first present relational-Graph, a new and large-scale user-recipe-ingredient graph.
We then propose RecipeRec, a novel heterogeneous graph learning model for recipe recommendation.
arXiv Detail & Related papers (2022-05-24T22:19:53Z) - Learning Structural Representations for Recipe Generation and Food
Retrieval [101.97397967958722]
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
arXiv Detail & Related papers (2021-10-04T06:36:31Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.