Digital Scale: Open-Source On-Device BMI Estimation from Smartphone Camera Images Trained on a Large-Scale Real-World Dataset
- URL: http://arxiv.org/abs/2508.20534v1
- Date: Thu, 28 Aug 2025 08:21:10 GMT
- Title: Digital Scale: Open-Source On-Device BMI Estimation from Smartphone Camera Images Trained on a Large-Scale Real-World Dataset
- Authors: Frederik Rajiv Manichand, Robin Deuber, Robert Jakob, Steve Swerling, Jamie Rosen, Elgar Fleisch, Patrick Langer,
- Abstract summary: Existing computer vision approaches have been limited to datasets of up to 14,500 images.<n>We present a deep learning-based BMI estimation method trained on our WayBED dataset.<n>We deploy the full pipeline, including image filtering and BMI estimation, on Android devices using the CLAID framework.
- Score: 3.9545263841567686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating Body Mass Index (BMI) from camera images with machine learning models enables rapid weight assessment when traditional methods are unavailable or impractical, such as in telehealth or emergency scenarios. Existing computer vision approaches have been limited to datasets of up to 14,500 images. In this study, we present a deep learning-based BMI estimation method trained on our WayBED dataset, a large proprietary collection of 84,963 smartphone images from 25,353 individuals. We introduce an automatic filtering method that uses posture clustering and person detection to curate the dataset by removing low-quality images, such as those with atypical postures or incomplete views. This process retained 71,322 high-quality images suitable for training. We achieve a Mean Absolute Percentage Error (MAPE) of 7.9% on our hold-out test set (WayBED data) using full-body images, the lowest value in the published literature to the best of our knowledge. Further, we achieve a MAPE of 13% on the completely unseen~(during training) VisualBodyToBMI dataset, comparable with state-of-the-art approaches trained on it, demonstrating robust generalization. Lastly, we fine-tune our model on VisualBodyToBMI and achieve a MAPE of 8.56%, the lowest reported value on this dataset so far. We deploy the full pipeline, including image filtering and BMI estimation, on Android devices using the CLAID framework. We release our complete code for model training, filtering, and the CLAID package for mobile deployment as open-source contributions.
Related papers
- Approximating Language Model Training Data from Weights [70.08614275061689]
We formalize the problem of data approximation from model weights and propose several baselines and metrics.<n>We develop a gradient-based approach that selects the highest-matching data from a large public text corpus.<n>Even when none of the true training data is known, our method is able to locate a small subset of public Web documents.
arXiv Detail & Related papers (2025-06-18T15:26:43Z) - UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights [10.447347462729462]
UGoDIT is designed for the low-data regime where only a very small number, M, of sub-sampled measurement vectors are available during training.<n>Our method learns a set of transferable weights by optimizing a shared encoder and M disentangled decoders.<n>We evaluate UGoDIT on both medical (multi-coil MRI) and natural (super resolution and non-linear deblurring) image recovery tasks.
arXiv Detail & Related papers (2025-05-16T22:05:28Z) - Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis [0.6524460254566905]
This study aims to assess the effectiveness of Contrastive Language-Image Pretraining (CLIP) in classifying human postures.<n>Applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results.<n>The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%.
arXiv Detail & Related papers (2025-01-13T11:20:44Z) - Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach [0.0]
'Celeb-FBI' dataset contains 7,211 full-body images of individuals accompanied by detailed information on their height, age, weight, and gender.
We employ three deep learning approaches: CNN, 50-layer ResNet, and 16-layer VGG, which are used for estimating height, weight, age, and gender from human full-body images.
From the results obtained, ResNet-50 performed best for the system with an accuracy rate of 79.18% for age, 95.43% for gender, 85.60% for height and 81.91% for weight.
arXiv Detail & Related papers (2024-07-03T20:16:47Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - PatchBMI-Net: Lightweight Facial Patch-based Ensemble for BMI Prediction [3.9440964696313485]
Self-diagnostic facial image-based BMI prediction methods are proposed for healthy weight monitoring.
These methods have mostly used convolutional neural network (CNN) based regression baselines, such as VGG19, ResNet50, and Efficient-NetB0.
This paper aims to develop a lightweight facial patch-based ensemble (PatchBMI-Net) for BMI prediction to facilitate the deployment and weight monitoring using smartphones.
arXiv Detail & Related papers (2023-11-29T21:39:24Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - ALiSNet: Accurate and Lightweight Human Segmentation Network for Fashion
E-Commerce [57.876602177247534]
Smartphones provide a convenient way for users to capture images of their body.
We create a new segmentation model by simplifying Semantic FPN with PointRend.
We finetune this model on a high-quality dataset of humans in a restricted set of poses relevant for our application.
arXiv Detail & Related papers (2023-04-15T11:06:32Z) - The effectiveness of MAE pre-pretraining for billion-scale pretraining [65.98338857597935]
We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model.
We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition.
arXiv Detail & Related papers (2023-03-23T17:56:12Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.