Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
- URL: http://arxiv.org/abs/2601.20051v1
- Date: Tue, 27 Jan 2026 20:53:45 GMT
- Title: Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
- Authors: Gautham Vinod, Bruce Coburn, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu,
- Abstract summary: We bridge the gap between 3D computer vision and digital health by proposing a method that recovers a true-to-scale 3D reconstructed object from a monocular image.<n>Our approach leverages rich visual features extracted from models trained on large-scale datasets to estimate the scale of the reconstructed object.
- Score: 19.138014263791803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of chronic diseases related to diet, such as obesity and diabetes, emphasizes the need for accurate monitoring of food intake. While AI-driven dietary assessment has made strides in recent years, the ill-posed nature of recovering size (portion) information from monocular images for accurate estimation of ``how much did you eat?'' is a pressing challenge. Some 3D reconstruction methods have achieved impressive geometric reconstruction but fail to recover the crucial real-world scale of the reconstructed object, limiting its usage in precision nutrition. In this paper, we bridge the gap between 3D computer vision and digital health by proposing a method that recovers a true-to-scale 3D reconstructed object from a monocular image. Our approach leverages rich visual features extracted from models trained on large-scale datasets to estimate the scale of the reconstructed object. This learned scale enables us to convert single-view 3D reconstructions into true-to-life, physically meaningful models. Extensive experiments and ablation studies on two publicly available datasets show that our method consistently outperforms existing techniques, achieving nearly a 30% reduction in mean absolute volume-estimation error, showcasing its potential to enhance the domain of precision nutrition. Code: https://gitlab.com/viper-purdue/size-matters
Related papers
- Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images [21.112563168240737]
Implicit-Scale 3D Reconstruction from Monocular Multi-Food Images is a benchmark dataset designed to advance geometry-based food portion estimation.<n>This benchmark reframes food portion estimation as an implicit-scale 3D reconstruction problem under monocular observations.
arXiv Detail & Related papers (2026-02-13T15:52:39Z) - Canonical Pose Reconstruction from Single Depth Image for 3D Non-rigid Pose Recovery on Limited Datasets [55.84702107871358]
3D reconstruction from 2D inputs, especially for non-rigid objects like humans, presents unique challenges.<n>Traditional methods often struggle with non-rigid shapes, which require extensive training data to cover the entire deformation space.<n>This study proposes a canonical pose reconstruction model that transforms single-view depth images of deformable shapes into a canonical form.
arXiv Detail & Related papers (2025-05-23T14:58:34Z) - Dietary Intake Estimation via Continuous 3D Reconstruction of Food [5.010690651107531]
This study proposes an approach to accurately monitor ingest behaviours by leveraging 3D food models constructed from monocular 2D video.<n>Experiments with toy models and real food items demonstrate the approach's potential.
arXiv Detail & Related papers (2025-05-01T15:35:42Z) - MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds [7.357322789192671]
In this paper, we introduce a new framework for accurate food estimation using only a single monocular image.
The framework consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and represents features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content.
arXiv Detail & Related papers (2024-11-14T22:17:27Z) - 3D Reconstruction of the Human Colon from Capsule Endoscope Video [2.3513645401551337]
We investigate the possibility of constructing 3D models of whole sections of the human colon using image sequences from wireless capsule endoscope video.<n>Recent developments of virtual graphics-based models of the human gastrointestinal system, where distortion and artifacts can be enabled or disabled, makes it possible to dissect'' the problem.
arXiv Detail & Related papers (2024-07-21T17:31:38Z) - MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results [52.07174491056479]
We host the MetaFood Workshop and its challenge for Physically Informed 3D Food Reconstruction.
This challenge focuses on reconstructing volume-accurate 3D models of food items from 2D images, using a visible checkerboard as a size reference.
The solutions developed in this challenge achieved promising results in 3D food reconstruction, with significant potential for improving portion estimation for dietary assessment and nutritional monitoring.
arXiv Detail & Related papers (2024-07-12T14:15:48Z) - NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating.
Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images.
We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information.
We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z) - NutritionVerse-Thin: An Optimized Strategy for Enabling Improved
Rendering of 3D Thin Food Models [66.77685168785152]
We present an optimized strategy for enabling improved rendering of thin 3D food models.
Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method.
While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
arXiv Detail & Related papers (2023-04-12T05:34:32Z) - NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake
Estimation [65.47310907481042]
One in four older adults are malnourished.
Machine learning and computer vision show promise of automated nutrition tracking methods of food.
NutritionVerse-3D is a large-scale high-resolution dataset of 105 3D food models.
arXiv Detail & Related papers (2023-04-12T05:27:30Z) - Recovering 3D Human Mesh from Monocular Images: A Survey [49.00136388529404]
Estimating human pose and shape from monocular images is a long-standing problem in computer vision.
This survey focuses on the task of monocular 3D human mesh recovery.
arXiv Detail & Related papers (2022-03-03T18:56:08Z) - Multi-View Consistency Loss for Improved Single-Image 3D Reconstruction
of Clothed People [36.30755368202957]
We present a novel method to improve the accuracy of the 3D reconstruction of clothed human shape from a single image.
The accuracy and completeness for reconstruction of clothed people is limited due to the large variation in shape resulting from clothing, hair, body size, pose and camera viewpoint.
arXiv Detail & Related papers (2020-09-29T17:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.