MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results
- URL: http://arxiv.org/abs/2407.09285v1
- Date: Fri, 12 Jul 2024 14:15:48 GMT
- Title: MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results
- Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang,
- Abstract summary: We host the MetaFood Workshop and its challenge for Physically Informed 3D Food Reconstruction.
This challenge focuses on reconstructing volume-accurate 3D models of food items from 2D images, using a visible checkerboard as a size reference.
The solutions developed in this challenge achieved promising results in 3D food reconstruction, with significant potential for improving portion estimation for dietary assessment and nutritional monitoring.
- Score: 52.07174491056479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop and its challenge for Physically Informed 3D Food Reconstruction. This challenge focuses on reconstructing volume-accurate 3D models of food items from 2D images, using a visible checkerboard as a size reference. Participants were tasked with reconstructing 3D models for 20 selected food items of varying difficulty levels: easy, medium, and hard. The easy level provides 200 images, the medium level provides 30 images, and the hard level provides only 1 image for reconstruction. In total, 16 teams submitted results in the final testing phase. The solutions developed in this challenge achieved promising results in 3D food reconstruction, with significant potential for improving portion estimation for dietary assessment and nutritional monitoring. More details about this workshop challenge and access to the dataset can be found at https://sites.google.com/view/cvpr-metafood-2024.
Related papers
- MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds [7.357322789192671]
In this paper, we introduce a new framework for accurate food estimation using only a single monocular image.
The framework consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and represents features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content.
arXiv Detail & Related papers (2024-11-14T22:17:27Z) - MetaFood3D: Large 3D Food Object Dataset with Nutrition Values [53.24500333363066]
This dataset consists of 637 meticulously labeled 3D food objects across 108 categories, featuring detailed nutrition information, weight, and food codes linked to a comprehensive nutrition database.
Experimental results demonstrate our dataset's significant potential for improving algorithm performance, highlight the challenging gap between video captures and 3D scanned data, and show the strength of the MetaFood3D dataset in high-quality data generation, simulation, and augmentation.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - Food Portion Estimation via 3D Object Scaling [8.164262056488447]
We propose a new framework to estimate both food volume and energy from 2D images.
Our method estimates the pose of the camera and the food object in the input image.
We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items.
arXiv Detail & Related papers (2024-04-18T15:23:37Z) - An End-to-end Food Portion Estimation Framework Based on Shape
Reconstruction from Monocular Image [7.380382380564532]
We propose an end-to-end deep learning framework for food energy estimation from a monocular image through 3D shape reconstruction.
Our method is evaluated on a publicly available food image dataset Nutrition5k, resulting a Mean Absolute Error (MAE) of 40.05 kCal and Mean Absolute Percentage Error (MAPE) of 11.47% for food energy estimation.
arXiv Detail & Related papers (2023-08-03T15:17:24Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - NutritionVerse-Thin: An Optimized Strategy for Enabling Improved
Rendering of 3D Thin Food Models [66.77685168785152]
We present an optimized strategy for enabling improved rendering of thin 3D food models.
Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method.
While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
arXiv Detail & Related papers (2023-04-12T05:34:32Z) - NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake
Estimation [65.47310907481042]
One in four older adults are malnourished.
Machine learning and computer vision show promise of automated nutrition tracking methods of food.
NutritionVerse-3D is a large-scale high-resolution dataset of 105 3D food models.
arXiv Detail & Related papers (2023-04-12T05:27:30Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.