Multi-Task Learning for Calorie Prediction on a Novel Large-Scale Recipe
Dataset Enriched with Nutritional Information
- URL: http://arxiv.org/abs/2011.01082v1
- Date: Mon, 2 Nov 2020 16:11:51 GMT
- Title: Multi-Task Learning for Calorie Prediction on a Novel Large-Scale Recipe
Dataset Enriched with Nutritional Information
- Authors: Robin Ruede, Verena Heusser, Lukas Frank, Alina Roitberg, Monica
Haurilet, Rainer Stiefelhagen
- Abstract summary: In this work, we aim to estimate the calorie amount of a meal directly from an image by learning from recipes people have published on the Internet.
We propose the pic2kcal benchmark comprising 308,000 images from over 70,000 recipes including photographs, ingredients and instructions.
Our experiments demonstrate clear benefits of multi-task learning for calorie estimation, surpassing the single-task calorie regression by 9.9%.
- Score: 25.646488178514186
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A rapidly growing amount of content posted online, such as food recipes,
opens doors to new exciting applications at the intersection of vision and
language. In this work, we aim to estimate the calorie amount of a meal
directly from an image by learning from recipes people have published on the
Internet, thus skipping time-consuming manual data annotation. Since there are
few large-scale publicly available datasets captured in unconstrained
environments, we propose the pic2kcal benchmark comprising 308,000 images from
over 70,000 recipes including photographs, ingredients and instructions. To
obtain nutritional information of the ingredients and automatically determine
the ground-truth calorie value, we match the items in the recipes with
structured information from a food item database.
We evaluate various neural networks for regression of the calorie quantity
and extend them with the multi-task paradigm. Our learning procedure combines
the calorie estimation with prediction of proteins, carbohydrates, and fat
amounts as well as a multi-label ingredient classification. Our experiments
demonstrate clear benefits of multi-task learning for calorie estimation,
surpassing the single-task calorie regression by 9.9%. To encourage further
research on this task, we make the code for generating the dataset and the
models publicly available.
Related papers
- NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene
Dataset for Dietary Intake Estimation [68.49526750115429]
We introduce NutritionVerse-Real, an open access manually collected 2D food scene dataset for dietary intake estimation.
The NutritionVerse-Real dataset was created by manually collecting images of food scenes in real life, measuring the weight of every ingredient and computing the associated dietary content of each dish.
arXiv Detail & Related papers (2023-11-20T11:05:20Z) - NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating.
Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images.
We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information.
We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z) - Calorie Aware Automatic Meal Kit Generation from an Image [7.170180366236038]
Given a single cooking image, a pipeline for calorie estimation and meal re-production is proposed.
Portion estimation introduced in the model helps improve calorie estimation and is also beneficial for meal re-production in different serving sizes.
arXiv Detail & Related papers (2021-12-18T04:16:12Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Multi-Task Image-Based Dietary Assessment for Food Recognition and
Portion Size Estimation [6.603050343996914]
We propose an end-to-end multi-task framework that can achieve both food classification and food portion size estimation.
Our results outperforms the baseline methods for both classification accuracy and mean absolute error for portion estimation.
arXiv Detail & Related papers (2020-04-27T21:35:07Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.