From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios
- URL: http://arxiv.org/abs/2403.07403v1
- Date: Tue, 12 Mar 2024 08:32:23 GMT
- Title: From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios
- Authors: Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
- Abstract summary: We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
- Score: 92.58097090916166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The precise recognition of food categories plays a pivotal role for
intelligent health management, attracting significant research attention in
recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172,
provide abundant food image resources that catalyze the prosperity of research
in this field. Nevertheless, these datasets are well-curated from canteen
scenarios and thus deviate from food appearances in daily life. This
discrepancy poses great challenges in effectively transferring classifiers
trained on these canteen datasets to broader daily-life scenarios encountered
by humans. Toward this end, we present two new benchmarks, namely DailyFood-172
and DailyFood-16, specifically designed to curate food images from everyday
meals. These two datasets are used to evaluate the transferability of
approaches from the well-curated food image domain to the everyday-life food
image domain. In addition, we also propose a simple yet effective baseline
method named Multi-Cluster Reference Learning (MCRL) to tackle the
aforementioned domain gap. MCRL is motivated by the observation that food
images in daily-life scenarios exhibit greater intra-class appearance variance
compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly
coupled with existing approaches, yielding non-trivial performance
enhancements. We hope our new benchmarks can inspire the community to explore
the transferability of food recognition models trained on well-curated datasets
toward practical real-life applications.
Related papers
- Personalized Food Image Classification: Benchmark Datasets and New
Baseline [8.019925729254178]
We propose a new framework for personalized food image classification by leveraging self-supervised learning and temporal image feature information.
Our method is evaluated on both benchmark datasets and shows improved performance compared to existing works.
arXiv Detail & Related papers (2023-09-15T20:11:07Z) - NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating.
Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images.
We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information.
We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Single-Stage Heavy-Tailed Food Classification [7.800379384628357]
We introduce a novel single-stage heavy-tailed food classification framework.
Our method is evaluated on two heavy-tailed food benchmark datasets, Food101-LT and VFN-LT.
arXiv Detail & Related papers (2023-07-01T00:45:35Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Long-tailed Food Classification [5.874935571318868]
We introduce two new benchmark datasets for long-tailed food classification including Food101-LT and VFN-LT.
We propose a novel 2-Phase framework to address the problem of class-imbalance by (1) under the head classes to remove redundant samples along with maintaining the learned information through knowledge distillation.
We show the effectiveness of our method by comparing with existing state-of-the-art long-tailed classification methods and show improved performance on both Food101-LT and VFN-LT benchmarks.
arXiv Detail & Related papers (2022-10-26T14:29:30Z) - Towards the Creation of a Nutrition and Food Group Based Image Database [58.429385707376554]
We propose a framework to create a nutrition and food group based image database.
We design a protocol for linking food group based food codes in the U.S. Department of Agriculture's (USDA) Food and Nutrient Database for Dietary Studies (FNDDS)
Our proposed method is used to build a nutrition and food group based image database including 16,114 food datasets.
arXiv Detail & Related papers (2022-06-05T02:41:44Z) - Large Scale Visual Food Recognition [43.43598316339732]
We introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.
Food2K bypasses them in both categories and images by one order of magnitude.
We propose a deep progressive region enhancement network for food recognition.
arXiv Detail & Related papers (2021-03-30T06:41:42Z) - MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images
with Latent Variable Model [28.649961369386148]
We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space.
Our method learns the cross-modal alignments during training but computes embeddings of different modalities independently at inference time for the sake of efficiency.
arXiv Detail & Related papers (2020-04-02T16:00:10Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.