Related papers: Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

URL: http://arxiv.org/abs/2010.08727v1
Date: Sat, 17 Oct 2020 06:43:18 GMT
Title: Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images
Authors: Jiatong Li, Fangda Han, Ricardo Guerrero, Vladimir Pavlovic
Abstract summary: We study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. Experiments on a dataset of recipes collected from the Internet show the model generates promising results.
Score: 24.26111169033236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. More specifically, we predict the ingredient amounts using a domain-driven Wasserstein loss from image-to-recipe cross-modal embeddings learned to align the two views of food data. Experiments on a dataset of recipes collected from the Internet show the model generates promising results and improves the baselines on this challenging task. A demo of our system and our data is availableat: foodai.cs.rutgers.edu.

Related papers

Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion [69.84988999191343]
We introduce FastFood, a dataset with 84,446 images across 908 fast food categories, featuring ingredient and nutritional annotations.<n>We propose a new model-agnostic Visual-Ingredient Feature Fusion (VIF$2$) method to enhance nutrition estimation.
arXiv Detail & Related papers (2025-05-13T17:01:21Z)
How Much You Ate? Food Portion Estimation on Spoons [63.611551981684244]
Current image-based food portion estimation algorithms assume that users take images of their meals one or two times. We introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews.
arXiv Detail & Related papers (2024-05-12T00:16:02Z)
NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene Dataset for Dietary Intake Estimation [68.49526750115429]
We introduce NutritionVerse-Real, an open access manually collected 2D food scene dataset for dietary intake estimation. The NutritionVerse-Real dataset was created by manually collecting images of food scenes in real life, measuring the weight of every ingredient and computing the associated dietary content of each dish.
arXiv Detail & Related papers (2023-11-20T11:05:20Z)
Personalized Food Image Classification: Benchmark Datasets and New Baseline [8.019925729254178]
We propose a new framework for personalized food image classification by leveraging self-supervised learning and temporal image feature information. Our method is evaluated on both benchmark datasets and shows improved performance compared to existing works.
arXiv Detail & Related papers (2023-09-15T20:11:07Z)
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images. We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information. We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z)
Towards the Creation of a Nutrition and Food Group Based Image Database [58.429385707376554]
We propose a framework to create a nutrition and food group based image database. We design a protocol for linking food group based food codes in the U.S. Department of Agriculture's (USDA) Food and Nutrient Database for Dietary Studies (FNDDS) Our proposed method is used to build a nutrition and food group based image database including 16,114 food datasets.
arXiv Detail & Related papers (2022-06-05T02:41:44Z)
Towards Building a Food Knowledge Graph for Internet of Food [66.57235827087092]
We review the evolution of food knowledge organization, from food classification to food to food knowledge graphs. Food knowledge graphs play an important role in food search and Question Answering (QA), personalized dietary recommendation, food analysis and visualization. Future directions for food knowledge graphs cover several fields such as multimodal food knowledge graphs and food intelligence.
arXiv Detail & Related papers (2021-07-13T06:26:53Z)
Saliency-Aware Class-Agnostic Food Image Segmentation [10.664526852464812]
We propose a class-agnostic food image segmentation method. Using information from both the before and after eating images, we can segment food images by finding the salient missing objects. Our method is validated on food images collected from a dietary study.
arXiv Detail & Related papers (2021-02-13T08:05:19Z)
An End-to-End Food Image Analysis System [8.622335099019214]
We propose an image-based food analysis framework that integrates food localization, classification and portion size estimation. Our proposed framework is end-to-end, i.e., the input can be an arbitrary food image containing multiple food items. Our framework is evaluated on a real life food image dataset collected from a nutrition feeding study.
arXiv Detail & Related papers (2021-02-01T05:36:20Z)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.