SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food
Detection
- URL: http://arxiv.org/abs/2310.04689v1
- Date: Sat, 7 Oct 2023 05:29:18 GMT
- Title: SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food
Detection
- Authors: Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying Jin and
Shuqiang Jiang
- Abstract summary: We propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD)
SeeDS consists of two modules: a Semantic Separable Synthesizer Module (S$3$M) and a Region Feature Denoising Diffusion Model (RFDDM)
- Score: 38.57712277980073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Food detection is becoming a fundamental task in food computing that supports
various multimedia applications, including food recommendation and dietary
monitoring. To deal with real-world scenarios, food detection needs to localize
and recognize novel food objects that are not seen during training, demanding
Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and
intra-class feature diversity poses challenges for ZSD methods in
distinguishing fine-grained food classes. To tackle this, we propose the
Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food
Detection (ZSFD). SeeDS consists of two modules: a Semantic Separable
Synthesizing Module (S$^3$M) and a Region Feature Denoising Diffusion Model
(RFDDM). The S$^3$M learns the disentangled semantic representation for complex
food attributes from ingredients and cuisines, and synthesizes discriminative
food features via enhanced semantic information. The RFDDM utilizes a novel
diffusion model to generate diversified region features and enhances ZSFD via
fine-grained synthesized features. Extensive experiments show the
state-of-the-art ZSFD performance of our proposed method on two food datasets,
ZSFooD and UECFOOD-256. Moreover, SeeDS also maintains effectiveness on general
ZSD datasets, PASCAL VOC and MS COCO. The code and dataset can be found at
https://github.com/LanceZPF/SeeDS.
Related papers
- MetaFood3D: Large 3D Food Object Dataset with Nutrition Values [53.24500333363066]
This dataset consists of 637 meticulously labeled 3D food objects across 108 categories, featuring detailed nutrition information, weight, and food codes linked to a comprehensive nutrition database.
Experimental results demonstrate our dataset's significant potential for improving algorithm performance, highlight the challenging gap between video captures and 3D scanned data, and show the strength of the MetaFood3D dataset in high-quality data generation, simulation, and augmentation.
arXiv Detail & Related papers (2024-09-03T15:02:52Z) - RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.
Uni-Food is designed to provide a more holistic approach to food data analysis.
We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food
Detection [37.866458336327184]
Food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios.
We first benchmark the task of Zero-Shot Food Detection (ZSFD) by introducing FOWA dataset with rich attribute annotations.
We propose a novel framework ZSFDet to tackle fine-grained problems by exploiting the interaction between complex attributes.
arXiv Detail & Related papers (2024-02-14T15:32:35Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection
Tasks [14.523433519237607]
Foodborne illness is a serious but preventable public health problem.
There is a dearth of labeled datasets for developing effective outbreak detection models.
We present TWEET-FID, the first publicly available annotated dataset for foodborne illness incident detection tasks.
arXiv Detail & Related papers (2022-05-22T03:47:18Z) - Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in
shared subspace [21.33710150033949]
We propose a simple yet novel architecture for shared subspace learning, which is used to tackle the food image-to-recipe retrieval problem.
Experimental analysis on the public Recipe1M dataset shows that the subspace learned via the proposed method outperforms the current state-of-the-arts.
In order to demonstrate the representational power of the learned subspace, we propose a generative food image synthesis model conditioned on the embeddings of recipes.
arXiv Detail & Related papers (2020-12-02T17:27:00Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.