Dish detection in food platters: A framework for automated diet logging
and nutrition management
- URL: http://arxiv.org/abs/2305.07552v1
- Date: Fri, 12 May 2023 15:25:58 GMT
- Title: Dish detection in food platters: A framework for automated diet logging
and nutrition management
- Authors: Mansi Goel, Shashank Dargar, Shounak Ghatak, Nidhi Verma, Pratik
Chauhan, Anushka Gupta, Nikhila Vishnumolakala, Hareesh Amuru, Ekta Gambhir,
Ronak Chhajed, Meenal Jain, Astha Jain, Samiksha Garg, Nitesh Narwade,
Nikhilesh Verhwani, Abhuday Tiwari, Kirti Vashishtha and Ganesh Bagler
- Abstract summary: Dish detection from food platters is a challenging problem due to a visually complex food layout.
We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-of-the-art model identification.
We implement the framework in the context of Indian food platters known for their complex presentation.
- Score: 1.7855867849530096
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diet is central to the epidemic of lifestyle disorders. Accurate and
effortless diet logging is one of the significant bottlenecks for effective
diet management and calorie restriction. Dish detection from food platters is a
challenging problem due to a visually complex food layout. We present an
end-to-end computational framework for diet management, from data compilation,
annotation, and state-of-the-art model identification to its mobile app
implementation. As a case study, we implement the framework in the context of
Indian food platters known for their complex presentation that poses a
challenge for the automated detection of dishes. Starting with the 61 most
popular Indian dishes, we identify the state-of-the-art model through a
comparative analysis of deep-learning-based object detection architectures.
Rooted in a meticulous compilation of 68,005 platter images with 134,814 manual
dish annotations, we first compare ten architectures for multi-label
classification to identify ResNet152 (mAP=84.51%) as the best model. YOLOv8x
(mAP=87.70%) emerged as the best model architecture for dish detection among
the eight deep-learning models implemented after a thorough performance
evaluation. By comparing with the state-of-the-art model for the IndianFood10
dataset, we demonstrate the superior object detection performance of YOLOv8x
for this subset and establish Resnet152 as the best architecture for
multi-label classification. The models thus trained on richly annotated data
can be extended to include dishes from across global cuisines. The proposed
framework is demonstrated through a proof-of-concept mobile application with
diverse applications for diet logging, food recommendation systems, nutritional
interventions, and mitigation of lifestyle disorders.
Related papers
- NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images [63.314702537010355]
Self-reporting methods are often inaccurate and suffer from substantial bias.
Recent work has explored using computer vision prediction systems to predict nutritional information from food images.
This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures.
arXiv Detail & Related papers (2024-05-13T14:56:55Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - UMDFood: Vision-language models boost food composition compilation [26.5694236976957]
We propose a novel vision-language model, UMDFood-VL, using front-of-package labeling and product images to accurately estimate food composition profiles.
Up to 82.2% of selected products' estimated error between chemical analysis results and model estimation results are less than 10%.
This performance sheds light on generalization towards other food and nutrition-related data compilation and catalyzation.
arXiv Detail & Related papers (2023-05-18T03:18:12Z) - Food Ingredients Recognition through Multi-label Learning [0.0]
The ability to recognize various food-items in a generic food plate is a key determinant for an automated diet assessment system.
We employ a deep multi-label learning approach and evaluate several state-of-the-art neural networks for their ability to detect an arbitrary number of ingredients in a dish image.
arXiv Detail & Related papers (2022-10-24T10:18:26Z) - A Mobile Food Recognition System for Dietary Assessment [6.982738885923204]
We focus on developing a mobile friendly, Middle Eastern cuisine focused food recognition application for assisted living purposes.
Using Mobilenet-v2 architecture for this task is beneficial in terms of both accuracy and the memory usage.
The developed mobile application has potential to serve the visually impaired in automatic food recognition via images.
arXiv Detail & Related papers (2022-04-20T12:49:36Z) - Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture.
The food localization step is based on an implementation of the Faster R-CNN method to identify food regions.
In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z) - MyFood: A Food Segmentation and Classification System to Aid Nutritional
Monitoring [1.5469452301122173]
The absence of food monitoring has contributed significantly to the increase in the population's weight.
Some solutions have been proposed in computer vision to recognize food images, but few are specialized in nutritional monitoring.
This work presents the development of an intelligent system that classifies and segments food presented in images to help the automatic monitoring of user diet and nutritional intake.
arXiv Detail & Related papers (2020-12-05T17:40:05Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.