Cross-lingual Adaptation for Recipe Retrieval with Mixup
- URL: http://arxiv.org/abs/2205.03891v1
- Date: Sun, 8 May 2022 15:04:39 GMT
- Title: Cross-lingual Adaptation for Recipe Retrieval with Mixup
- Authors: Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Wing-Kwong Chan
- Abstract summary: Cross-modal recipe retrieval has attracted research attention in recent years, thanks to the availability of large-scale paired data for training.
This paper studies unsupervised domain adaptation for image-to-recipe retrieval, where recipes in source and target domains are in different languages.
A novel recipe mixup method is proposed to learn transferable embedding features between the two domains.
- Score: 56.79360103639741
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Cross-modal recipe retrieval has attracted research attention in recent
years, thanks to the availability of large-scale paired data for training.
Nevertheless, obtaining adequate recipe-image pairs covering the majority of
cuisines for supervised learning is difficult if not impossible. By
transferring knowledge learnt from a data-rich cuisine to a data-scarce
cuisine, domain adaptation sheds light on this practical problem. Nevertheless,
existing works assume recipes in source and target domains are mostly
originated from the same cuisine and written in the same language. This paper
studies unsupervised domain adaptation for image-to-recipe retrieval, where
recipes in source and target domains are in different languages. Moreover, only
recipes are available for training in the target domain. A novel recipe mixup
method is proposed to learn transferable embedding features between the two
domains. Specifically, recipe mixup produces mixed recipes to form an
intermediate domain by discretely exchanging the section(s) between source and
target recipes. To bridge the domain gap, recipe mixup loss is proposed to
enforce the intermediate domain to locate in the shortest geodesic path between
source and target domains in the recipe embedding space. By using Recipe 1M
dataset as source domain (English) and Vireo-FoodTransfer dataset as target
domain (Chinese), empirical experiments verify the effectiveness of recipe
mixup for cross-lingual adaptation in the context of image-to-recipe retrieval.
Related papers
- MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval [6.582204441933583]
We propose a mask-augmentation-based local matching network (MALM) for image-to-recipe retrieval.
Experimental results on Recipe1M dataset show our method can clearly outperform state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2023-05-18T22:25:50Z) - Bidirectional Domain Mixup for Domain Adaptive Semantic Segmentation [73.3083304858763]
This paper systematically studies the impact of mixup under the domain adaptaive semantic segmentation task.
In specific, we achieve domain mixup in two-step: cut and paste.
We provide extensive ablation experiments to empirically verify our main components of the framework.
arXiv Detail & Related papers (2023-03-17T05:22:44Z) - Predefined domain specific embeddings of food concepts and recipes: A
case study on heterogeneous recipe datasets [0.0]
Recipe datasets are usually collected from social media websites where users post and publish recipes.
We collect six different recipe datasets, publicly available, in different formats, and some including data in different languages.
Bringing all of these datasets to the needed format for applying a machine learning (ML) pipeline for nutrient prediction is presented.
arXiv Detail & Related papers (2023-02-02T10:49:06Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Domain Generalization with MixStyle [120.52367818581608]
Domain generalization aims to address this problem by learning from a set of source domains a model that is generalizable to any unseen domain.
Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style.
MixStyle fits into mini-batch training perfectly and is extremely easy to implement.
arXiv Detail & Related papers (2021-04-05T16:58:09Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.