Large Scale Visual Food Recognition
- URL: http://arxiv.org/abs/2103.16107v2
- Date: Wed, 31 Mar 2021 05:01:34 GMT
- Title: Large Scale Visual Food Recognition
- Authors: Weiqing Min and Zhiling Wang and Yuxin Liu and Mengjiang Luo and
Liping Kang and Xiaoming Wei and Xiaolin Wei and Shuqiang Jiang
- Abstract summary: We introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.
Food2K bypasses them in both categories and images by one order of magnitude.
We propose a deep progressive region enhancement network for food recognition.
- Score: 43.43598316339732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Food recognition plays an important role in food choice and intake, which is
essential to the health and well-being of humans. It is thus of importance to
the computer vision community, and can further support many food-oriented
vision and multimodal tasks. Unfortunately, we have witnessed remarkable
advancements in generic visual recognition for released large-scale datasets,
yet largely lags in the food domain. In this paper, we introduce Food2K, which
is the largest food recognition dataset with 2,000 categories and over 1
million images.Compared with existing food recognition datasets, Food2K
bypasses them in both categories and images by one order of magnitude, and thus
establishes a new challenging benchmark to develop advanced models for food
visual representation learning. Furthermore, we propose a deep progressive
region enhancement network for food recognition, which mainly consists of two
components, namely progressive local feature learning and region feature
enhancement. The former adopts improved progressive training to learn diverse
and complementary local features, while the latter utilizes self-attention to
incorporate richer context with multiple scales into local features for further
local feature enhancement. Extensive experiments on Food2K demonstrate the
effectiveness of our proposed method. More importantly, we have verified better
generalization ability of Food2K in various tasks, including food recognition,
food image retrieval, cross-modal recipe retrieval, food detection and
segmentation. Food2K can be further explored to benefit more food-relevant
tasks including emerging and more complex ones (e.g., nutritional understanding
of food), and the trained models on Food2K can be expected as backbones to
improve the performance of more food-relevant tasks. We also hope Food2K can
serve as a large scale fine-grained visual recognition benchmark.
Related papers
- MetaFood3D: Large 3D Food Object Dataset with Nutrition Values [53.24500333363066]
This dataset consists of 637 meticulously labeled 3D food objects across 108 categories, featuring detailed nutrition information, weight, and food codes linked to a comprehensive nutrition database.
Experimental results demonstrate our dataset's significant potential for improving algorithm performance, highlight the challenging gap between video captures and 3D scanned data, and show the strength of the MetaFood3D dataset in high-quality data generation, simulation, and augmentation.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - Long-Tailed Continual Learning For Visual Food Recognition [5.377869029561348]
The distribution of food images in real life is usually long-tailed as a small number of popular food types are consumed more frequently than others.
We propose a novel end-to-end framework for long-tailed continual learning, which effectively addresses the catastrophic forgetting.
We also introduce a novel data augmentation technique by integrating class-activation-map (CAM) and CutMix.
arXiv Detail & Related papers (2023-07-01T00:55:05Z) - A Mobile Food Recognition System for Dietary Assessment [6.982738885923204]
We focus on developing a mobile friendly, Middle Eastern cuisine focused food recognition application for assisted living purposes.
Using Mobilenet-v2 architecture for this task is beneficial in terms of both accuracy and the memory usage.
The developed mobile application has potential to serve the visually impaired in automatic food recognition via images.
arXiv Detail & Related papers (2022-04-20T12:49:36Z) - Towards Building a Food Knowledge Graph for Internet of Food [66.57235827087092]
We review the evolution of food knowledge organization, from food classification to food to food knowledge graphs.
Food knowledge graphs play an important role in food search and Question Answering (QA), personalized dietary recommendation, food analysis and visualization.
Future directions for food knowledge graphs cover several fields such as multimodal food knowledge graphs and food intelligence.
arXiv Detail & Related papers (2021-07-13T06:26:53Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.