SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights
- URL: http://arxiv.org/abs/2507.04412v1
- Date: Sun, 06 Jul 2025 15:00:21 GMT
- Title: SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights
- Authors: Zhenbo Xu, Jinghan Yang, Gong Huang, Jiqing Feng, Liu Liu, Ruihan Sun, Ajin Meng, Zhuo Zhang, Zhaofeng He,
- Abstract summary: We build the first large-scale spectral food (SFOOD) benchmark suite.<n>The benchmark consists of 3,266 food categories and 2,351 k data points for 17 main food categories.<n>Our benchmark will be open source and continuously iterated for different food analysis tasks.
- Score: 12.320129303732822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise and development of computer vision and LLMs, intelligence is everywhere, especially for people and cars. However, for tremendous food attributes (such as origin, quantity, weight, quality, sweetness, etc.), existing research still mainly focuses on the study of categories. The reason is the lack of a large and comprehensive benchmark for food. Besides, many food attributes (such as sweetness, weight, and fine-grained categories) are challenging to accurately percept solely through RGB cameras. To fulfill this gap and promote the development of intelligent food analysis, in this paper, we built the first large-scale spectral food (SFOOD) benchmark suite. We spent a lot of manpower and equipment costs to organize existing food datasets and collect hyperspectral images of hundreds of foods, and we used instruments to experimentally determine food attributes such as sweetness and weight. The resulting benchmark consists of 3,266 food categories and 2,351 k data points for 17 main food categories. Extensive evaluations find that: (i) Large-scale models are still poor at digitizing food. Compared to people and cars, food has gradually become one of the most difficult objects to study; (ii) Spectrum data are crucial for analyzing food properties (such as sweetness). Our benchmark will be open source and continuously iterated for different food analysis tasks.
Related papers
- MetaFood3D: 3D Food Dataset with Nutrition Values [52.16894900096017]
This dataset consists of 743 meticulously scanned and labeled 3D food objects across 131 categories.<n>Our MetaFood3D dataset emphasizes intra-class diversity and includes rich modalities such as textured mesh files, RGB-D videos, and segmentation masks.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture [60.51749998013166]
We introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China.
We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions.
Our findings highlight that understanding food and its cultural implications remains a challenging and under-explored direction.
arXiv Detail & Related papers (2024-06-16T17:59:32Z) - FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination [37.11551779015218]
We introduce Food-oriented Large Language Models (LLMs) to comprehend food data through perception and reasoning.
Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth.
We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky.
arXiv Detail & Related papers (2024-06-11T01:27:00Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake
Estimation [65.47310907481042]
One in four older adults are malnourished.
Machine learning and computer vision show promise of automated nutrition tracking methods of food.
NutritionVerse-3D is a large-scale high-resolution dataset of 105 3D food models.
arXiv Detail & Related papers (2023-04-12T05:27:30Z) - Towards Building a Food Knowledge Graph for Internet of Food [66.57235827087092]
We review the evolution of food knowledge organization, from food classification to food to food knowledge graphs.
Food knowledge graphs play an important role in food search and Question Answering (QA), personalized dietary recommendation, food analysis and visualization.
Future directions for food knowledge graphs cover several fields such as multimodal food knowledge graphs and food intelligence.
arXiv Detail & Related papers (2021-07-13T06:26:53Z) - Large Scale Visual Food Recognition [43.43598316339732]
We introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.
Food2K bypasses them in both categories and images by one order of magnitude.
We propose a deep progressive region enhancement network for food recognition.
arXiv Detail & Related papers (2021-03-30T06:41:42Z) - MyFood: A Food Segmentation and Classification System to Aid Nutritional
Monitoring [1.5469452301122173]
The absence of food monitoring has contributed significantly to the increase in the population's weight.
Some solutions have been proposed in computer vision to recognize food images, but few are specialized in nutritional monitoring.
This work presents the development of an intelligent system that classifies and segments food presented in images to help the automatic monitoring of user diet and nutritional intake.
arXiv Detail & Related papers (2020-12-05T17:40:05Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.