NutritionVerse-Thin: An Optimized Strategy for Enabling Improved
Rendering of 3D Thin Food Models
- URL: http://arxiv.org/abs/2304.05620v1
- Date: Wed, 12 Apr 2023 05:34:32 GMT
- Title: NutritionVerse-Thin: An Optimized Strategy for Enabling Improved
Rendering of 3D Thin Food Models
- Authors: Chi-en Amy Tai, Jason Li, Sriram Kumar, Saeejith Nair, Yuhao Chen,
Pengcheng Xi, Alexander Wong
- Abstract summary: We present an optimized strategy for enabling improved rendering of thin 3D food models.
Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method.
While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
- Score: 66.77685168785152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growth in capabilities of generative models, there has been growing
interest in using photo-realistic renders of common 3D food items to improve
downstream tasks such as food printing, nutrition prediction, or management of
food wastage. Despite 3D modelling capabilities being more accessible than ever
due to the success of NeRF based view-synthesis, such rendering methods still
struggle to correctly capture thin food objects, often generating meshes with
significant holes. In this study, we present an optimized strategy for enabling
improved rendering of thin 3D food models, and demonstrate qualitative
improvements in rendering quality. Our method generates the 3D model mesh via a
proposed thin-object-optimized differentiable reconstruction method and tailors
the strategy at both the data collection and training stages to better handle
thin objects. While simple, we find that this technique can be employed for
quick and highly consistent capturing of thin 3D objects.
Related papers
- MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds [7.357322789192671]
In this paper, we introduce a new framework for accurate food estimation using only a single monocular image.
The framework consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and represents features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content.
arXiv Detail & Related papers (2024-11-14T22:17:27Z) - Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models [29.818123424954294]
Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling.
We propose a Latent Consistency Model (LCM) adaptation for the task at hand.
We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively.
arXiv Detail & Related papers (2024-06-17T04:40:07Z) - Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models [25.482316017879327]
We present textbfFrequency modulattextbfed tritextbfplane (textbfFreeplane), a simple yet effective method to improve the generation quality of feed-forward models without additional training.
We first analyze the role of triplanes in feed-forward methods and find that the inconsistent multi-view images introduce high-frequency artifacts on triplanes, leading to low-quality 3D meshes.
arXiv Detail & Related papers (2024-06-02T14:07:50Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation [69.91401809979709]
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images.
We introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.
The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs.
arXiv Detail & Related papers (2023-12-06T15:07:12Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions.
We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.