Learning Structural Representations for Recipe Generation and Food
Retrieval
- URL: http://arxiv.org/abs/2110.01209v1
- Date: Mon, 4 Oct 2021 06:36:31 GMT
- Title: Learning Structural Representations for Recipe Generation and Food
Retrieval
- Authors: Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
- Abstract summary: We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
- Score: 101.97397967958722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Food is significant to human daily life. In this paper, we are interested in
learning structural representations for lengthy recipes, that can benefit the
recipe generation and food retrieval tasks. We mainly investigate an open
research task of generating cooking instructions based on food images and
ingredients, which is similar to the image captioning task. However, compared
with image captioning datasets, the target recipes are lengthy paragraphs and
do not have annotations on structure information. To address the above
limitations, we propose a novel framework of Structure-aware Generation Network
(SGN) to tackle the food recipe generation task. Our approach brings together
several novel ideas in a systematic framework: (1) exploiting an unsupervised
learning approach to obtain the sentence-level tree structure labels before
training; (2) generating trees of target recipes from images with the
supervision of tree structure labels learned from (1); and (3) integrating the
inferred tree structures into the recipe generation procedure. Our proposed
model can produce high-quality and coherent recipes, and achieve the
state-of-the-art performance on the benchmark Recipe1M dataset. We also
validate the usefulness of our learned tree structures in the food cross-modal
retrieval task, where the proposed model with tree representations can
outperform state-of-the-art benchmark results.
Related papers
- Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.
It retrieves recipes semantically related to the image from an existing datastore as a supplement.
It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z) - Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural
Networks [23.378813327724686]
We formalize the problem of multi-modal recipe representation learning to integrate the visual, textual, and relational information into recipe embeddings.
We first present Large-RG, a new recipe graph data with over half a million nodes, making it the largest recipe graph to date.
We then propose Recipe2Vec, a novel graph neural network based recipe embedding model to capture multi-modal information.
arXiv Detail & Related papers (2022-05-24T23:04:02Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.