FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via
Multi-Scale Feature Decoupling Network
- URL: http://arxiv.org/abs/2108.04644v1
- Date: Tue, 10 Aug 2021 12:47:04 GMT
- Title: FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via
Multi-Scale Feature Decoupling Network
- Authors: Qiang Hou, Weiqing Min, Jing Wang, Sujuan Hou, Yuanjie Zheng, Shuqiang
Jiang
- Abstract summary: A large-scale food logo dataset is urgently needed for developing advanced food logo detection algorithms.
FoodLogoDet-1500 is a new large-scale publicly available food logo dataset with 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects.
We propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet) to solve the problem of distinguishing multiple food logo categories.
- Score: 55.49022825759331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Food logo detection plays an important role in the multimedia for its wide
real-world applications, such as food recommendation of the self-service shop
and infringement detection on e-commerce platforms. A large-scale food logo
dataset is urgently needed for developing advanced food logo detection
algorithms. However, there are no available food logo datasets with food brand
information. To support efforts towards food logo detection, we introduce the
dataset FoodLogoDet-1500, a new large-scale publicly available food logo
dataset, which has 1,500 categories, about 100,000 images and about 150,000
manually annotated food logo objects. We describe the collection and annotation
process of FoodLogoDet-1500, analyze its scale and diversity, and compare it
with other logo datasets. To the best of our knowledge, FoodLogoDet-1500 is the
first largest publicly available high-quality dataset for food logo detection.
The challenge of food logo detection lies in the large-scale categories and
similarities between food logo categories. For that, we propose a novel food
logo detection method Multi-scale Feature Decoupling Network (MFDNet), which
decouples classification and regression into two branches and focuses on the
classification branch to solve the problem of distinguishing multiple food logo
categories. Specifically, we introduce the feature offset module, which
utilizes the deformation-learning for optimal classification offset and can
effectively obtain the most representative features of classification in
detection. In addition, we adopt a balanced feature pyramid in MFDNet, which
pays attention to global information, balances the multi-scale feature maps,
and enhances feature extraction capability. Comprehensive experiments on
FoodLogoDet-1500 and other two benchmark logo datasets demonstrate the
effectiveness of the proposed method. The FoodLogoDet-1500 can be found at this
https URL.
Related papers
- Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred
Thousand-Scale One-Shot Logo Identification [2.243832625209014]
We study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting.
We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos.
We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks.
arXiv Detail & Related papers (2022-11-23T12:59:41Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture.
The food localization step is based on an implementation of the Faster R-CNN method to identify food regions.
In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - LogoDet-3K: A Large-Scale Image Dataset for Logo Detection [61.296935298332606]
We introduce LogoDet-3K, the largest logo detection dataset with full annotation.
It has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images.
We propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection.
arXiv Detail & Related papers (2020-08-12T14:57:53Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.