VolTex: Food Volume Estimation using Text-Guided Segmentation and Neural Surface Reconstruction
- URL: http://arxiv.org/abs/2506.02895v1
- Date: Tue, 03 Jun 2025 14:03:28 GMT
- Title: VolTex: Food Volume Estimation using Text-Guided Segmentation and Neural Surface Reconstruction
- Authors: Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva,
- Abstract summary: Existing 3D Food Volume estimation methods accurately compute the food volume but lack for food portions selection.<n>We present VolTex, a framework that improves changethe food object selection in food volume estimation.
- Score: 4.282795945742752
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate food volume estimation is crucial for dietary monitoring, medical nutrition management, and food intake analysis. Existing 3D Food Volume estimation methods accurately compute the food volume but lack for food portions selection. We present VolTex, a framework that improves \change{the food object selection} in food volume estimation. Allowing users to specify a target food item via text input to be segmented, our method enables the precise selection of specific food objects in real-world scenes. The segmented object is then reconstructed using the Neural Surface Reconstruction method to generate high-fidelity 3D meshes for volume computation. Extensive evaluations on the MetaFood3D dataset demonstrate the effectiveness of our approach in isolating and reconstructing food items for accurate volume estimation. The source code is accessible at https://github.com/GCVCG/VolTex.
Related papers
- VolE: A Point-cloud Framework for Food 3D Reconstruction and Volume Estimation [4.621139625109643]
We present VolE, a novel framework that leverages mobile device-driven 3D reconstruction to estimate food volume.<n>VolE captures images and camera locations in free motion to generate precise 3D models, thanks to AR-capable mobile devices.<n>Our experiments demonstrate that VolE outperforms the existing volume estimation techniques across multiple datasets by achieving 2.22 % MAPE.
arXiv Detail & Related papers (2025-05-15T12:03:05Z) - Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion [69.84988999191343]
We introduce FastFood, a dataset with 84,446 images across 908 fast food categories, featuring ingredient and nutritional annotations.<n>We propose a new model-agnostic Visual-Ingredient Feature Fusion (VIF$2$) method to enhance nutrition estimation.
arXiv Detail & Related papers (2025-05-13T17:01:21Z) - MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds [7.357322789192671]
In this paper, we introduce a new framework for accurate food estimation using only a single monocular image.
The framework consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and represents features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content.
arXiv Detail & Related papers (2024-11-14T22:17:27Z) - MetaFood3D: 3D Food Dataset with Nutrition Values [52.16894900096017]
This dataset consists of 743 meticulously scanned and labeled 3D food objects across 131 categories.<n>Our MetaFood3D dataset emphasizes intra-class diversity and includes rich modalities such as textured mesh files, RGB-D videos, and segmentation masks.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results [52.07174491056479]
We host the MetaFood Workshop and its challenge for Physically Informed 3D Food Reconstruction.
This challenge focuses on reconstructing volume-accurate 3D models of food items from 2D images, using a visible checkerboard as a size reference.
The solutions developed in this challenge achieved promising results in 3D food reconstruction, with significant potential for improving portion estimation for dietary assessment and nutritional monitoring.
arXiv Detail & Related papers (2024-07-12T14:15:48Z) - VolETA: One- and Few-shot Food Volume Estimation [4.282795945742752]
We present VolETA, a sophisticated methodology for estimating food volume using 3D generative techniques.
Our approach creates a scaled 3D mesh of food objects using one- or few-RGBD images.
We achieve robust and accurate volume estimations with 10.97% MAPE using the MTF dataset.
arXiv Detail & Related papers (2024-07-01T18:47:15Z) - How Much You Ate? Food Portion Estimation on Spoons [63.611551981684244]
Current image-based food portion estimation algorithms assume that users take images of their meals one or two times.
We introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils.
The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews.
arXiv Detail & Related papers (2024-05-12T00:16:02Z) - Food Portion Estimation via 3D Object Scaling [8.164262056488447]
We propose a new framework to estimate both food volume and energy from 2D images.
Our method estimates the pose of the camera and the food object in the input image.
We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items.
arXiv Detail & Related papers (2024-04-18T15:23:37Z) - NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene
Dataset for Dietary Intake Estimation [68.49526750115429]
We introduce NutritionVerse-Real, an open access manually collected 2D food scene dataset for dietary intake estimation.
The NutritionVerse-Real dataset was created by manually collecting images of food scenes in real life, measuring the weight of every ingredient and computing the associated dietary content of each dish.
arXiv Detail & Related papers (2023-11-20T11:05:20Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - An End-to-End Food Image Analysis System [8.622335099019214]
We propose an image-based food analysis framework that integrates food localization, classification and portion size estimation.
Our proposed framework is end-to-end, i.e., the input can be an arbitrary food image containing multiple food items.
Our framework is evaluated on a real life food image dataset collected from a nutrition feeding study.
arXiv Detail & Related papers (2021-02-01T05:36:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.