Structure-Aware Generation Network for Recipe Generation from Images
- URL: http://arxiv.org/abs/2009.00944v1
- Date: Wed, 2 Sep 2020 10:54:25 GMT
- Title: Structure-Aware Generation Network for Recipe Generation from Images
- Authors: Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
- Abstract summary: We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
- Score: 142.047662926209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharing food has become very popular with the development of social media.
For many real-world applications, people are keen to know the underlying
recipes of a food item. In this paper, we are interested in automatically
generating cooking instructions for food. We investigate an open research task
of generating cooking instructions based on only food images and ingredients,
which is similar to the image captioning task. However, compared with image
captioning datasets, the target recipes are long-length paragraphs and do not
have annotations on structure information. To address the above limitations, we
propose a novel framework of Structure-aware Generation Network (SGN) to tackle
the food recipe generation task. Our approach brings together several novel
ideas in a systematic framework: (1) exploiting an unsupervised learning
approach to obtain the sentence-level tree structure labels before training;
(2) generating trees of target recipes from images with the supervision of tree
structure labels learned from (1); and (3) integrating the inferred tree
structures with the recipe generation procedure. Our proposed model can produce
high-quality and coherent recipes, and achieve the state-of-the-art performance
on the benchmark Recipe1M dataset.
Related papers
- Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Learning Program Representations for Food Images and Cooking Recipes [26.054436410924737]
We propose to represent cooking recipes and food images as cooking programs.
A model is trained to learn a joint embedding between recipes and food images via self-supervision.
We show that projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results.
arXiv Detail & Related papers (2022-03-30T05:52:41Z) - Learning Structural Representations for Recipe Generation and Food
Retrieval [101.97397967958722]
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
arXiv Detail & Related papers (2021-10-04T06:36:31Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval [20.292467149387594]
We introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks.
Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision.
arXiv Detail & Related papers (2021-02-04T11:24:34Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.