ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network
- URL: http://arxiv.org/abs/2008.05655v1
- Date: Thu, 13 Aug 2020 02:48:27 GMT
- Title: ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network
- Authors: Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei,
Xiaolin Wei, Shuqiang Jiang
- Abstract summary: We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
- Score: 50.7720194859196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Food recognition has received more and more attention in the multimedia
community for its various real-world applications, such as diet management and
self-service restaurants. A large-scale ontology of food images is urgently
needed for developing advanced large-scale food recognition algorithms, as well
as for providing the benchmark dataset for such algorithms. To encourage
further progress in food recognition, we introduce the dataset ISIA Food- 500
with 500 categories from the list in the Wikipedia and 399,726 images, a more
comprehensive food dataset that surpasses existing popular benchmark datasets
by category coverage and data volume. Furthermore, we propose a stacked
global-local attention network, which consists of two sub-networks for food
recognition. One subnetwork first utilizes hybrid spatial-channel attention to
extract more discriminative features, and then aggregates these multi-scale
discriminative features from multiple layers into global-level representation
(e.g., texture and shape information about food). The other one generates
attentional regions (e.g., ingredient relevant regions) from different regions
via cascaded spatial transformers, and further aggregates these multi-scale
regional features from different layers into local-level representation. These
two types of features are finally fused as comprehensive representation for
food recognition. Extensive experiments on ISIA Food-500 and other two popular
benchmark datasets demonstrate the effectiveness of our proposed method, and
thus can be considered as one strong baseline. The dataset, code and models can
be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.
Related papers
- MetaFood3D: Large 3D Food Object Dataset with Nutrition Values [53.24500333363066]
This dataset consists of 637 meticulously labeled 3D food objects across 108 categories, featuring detailed nutrition information, weight, and food codes linked to a comprehensive nutrition database.
Experimental results demonstrate our dataset's significant potential for improving algorithm performance, highlight the challenging gap between video captures and 3D scanned data, and show the strength of the MetaFood3D dataset in high-quality data generation, simulation, and augmentation.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Mining Discriminative Food Regions for Accurate Food Recognition [16.78437844398436]
We propose a novel network architecture in which a primary network maintains the base accuracy of classifying an input image.
An auxiliary network adversarially mines discriminative food regions, and a region network classifies the resulting mined regions.
The proposed architecture denoted as PAR-Net is end-to-end trainable, and highlights discriminative regions in an online fashion.
arXiv Detail & Related papers (2022-07-08T05:09:24Z) - FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via
Multi-Scale Feature Decoupling Network [55.49022825759331]
A large-scale food logo dataset is urgently needed for developing advanced food logo detection algorithms.
FoodLogoDet-1500 is a new large-scale publicly available food logo dataset with 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects.
We propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet) to solve the problem of distinguishing multiple food logo categories.
arXiv Detail & Related papers (2021-08-10T12:47:04Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Large Scale Visual Food Recognition [43.43598316339732]
We introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.
Food2K bypasses them in both categories and images by one order of magnitude.
We propose a deep progressive region enhancement network for food recognition.
arXiv Detail & Related papers (2021-03-30T06:41:42Z) - Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture.
The food localization step is based on an implementation of the Faster R-CNN method to identify food regions.
In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.