Related papers: Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

URL: http://arxiv.org/abs/2205.12396v1
Date: Tue, 24 May 2022 23:04:02 GMT
Title: Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks
Authors: Yijun Tian, Chuxu Zhang, Zhichun Guo, Yihong Ma, Ronald Metoyer, Nitesh V. Chawla
Abstract summary: We formalize the problem of multi-modal recipe representation learning to integrate the visual, textual, and relational information into recipe embeddings. We first present Large-RG, a new recipe graph data with over half a million nodes, making it the largest recipe graph to date. We then propose Recipe2Vec, a novel graph neural network based recipe embedding model to capture multi-modal information.
Score: 23.378813327724686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning effective recipe representations is essential in food studies. Unlike what has been developed for image-based recipe retrieval or learning structural text embeddings, the combined effect of multi-modal information (i.e., recipe images, text, and relation data) receives less attention. In this paper, we formalize the problem of multi-modal recipe representation learning to integrate the visual, textual, and relational information into recipe embeddings. In particular, we first present Large-RG, a new recipe graph data with over half a million nodes, making it the largest recipe graph to date. We then propose Recipe2Vec, a novel graph neural network based recipe embedding model to capture multi-modal information. Additionally, we introduce an adversarial attack strategy to ensure stable learning and improve performance. Finally, we design a joint objective function of node classification and adversarial learning to optimize the model. Extensive experiments demonstrate that Recipe2Vec outperforms state-of-the-art baselines on two classic food study tasks, i.e., cuisine category classification and region prediction. Dataset and codes are available at https://github.com/meettyj/Recipe2Vec.

Related papers

Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation. It retrieves recipes semantically related to the image from an existing datastore as a supplement. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z)
Deep Image-to-Recipe Translation [0.0]
Deep Image-to-Recipe Translation aims to bridge the gap between cherished food memories and the art of culinary creation. Our primary objective involves predicting ingredients from a given food image. Our approach emphasizes the importance of metrics such as Intersection over Union (IoU) and F1 score in scenarios where accuracy alone might be misleading.
arXiv Detail & Related papers (2024-07-01T02:33:07Z)
RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation [26.84274830886026]
We formalize the problem of recipe recommendation with graphs to incorporate the collaborative signal into recipe recommendation. We first present relational-Graph, a new and large-scale user-recipe-ingredient graph. We then propose RecipeRec, a novel heterogeneous graph learning model for recipe recommendation.
arXiv Detail & Related papers (2022-05-24T22:19:53Z)
Learning Structural Representations for Recipe Generation and Food Retrieval [101.97397967958722]
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task. Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
arXiv Detail & Related papers (2021-10-04T06:36:31Z)
Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives. We propose a simplified end-to-end model based on well established and high performing encoders for text and images. Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z)
Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients. Target recipes are long-length paragraphs and do not have annotations on structure information. We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z)
Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction. We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z)
Decomposing Generation Networks with Structure Prediction for Recipe Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.