Feature-Suppressed Contrast for Self-Supervised Food Pre-training
- URL: http://arxiv.org/abs/2308.03272v3
- Date: Sun, 8 Oct 2023 05:19:04 GMT
- Title: Feature-Suppressed Contrast for Self-Supervised Food Pre-training
- Authors: Xinda Liu, Yaohui Zhu, Linhu Liu, Jiang Tian, Lili Wang
- Abstract summary: We propose Feature Suppressed Contrast (FeaSC) to reduce mutual information between views.
FeaSC uses a response-aware scheme to localize salient features in an unsupervised manner.
As a plug-and-play module, the proposed method consistently improves BYOL and SimSiam by 1.70% $sim$ 6.69% classification accuracy on four publicly available food recognition datasets.
- Score: 22.48308786497061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most previous approaches for analyzing food images have relied on extensively
annotated datasets, resulting in significant human labeling expenses due to the
varied and intricate nature of such images. Inspired by the effectiveness of
contrastive self-supervised methods in utilizing unlabelled data, we explore
leveraging these techniques on unlabelled food images. In contrastive
self-supervised methods, two views are randomly generated from an image by data
augmentations. However, regarding food images, the two views tend to contain
similar informative contents, causing large mutual information, which impedes
the efficacy of contrastive self-supervised learning. To address this problem,
we propose Feature Suppressed Contrast (FeaSC) to reduce mutual information
between views. As the similar contents of the two views are salient or highly
responsive in the feature map, the proposed FeaSC uses a response-aware scheme
to localize salient features in an unsupervised manner. By suppressing some
salient features in one view while leaving another contrast view unchanged, the
mutual information between the two views is reduced, thereby enhancing the
effectiveness of contrast learning for self-supervised food pre-training. As a
plug-and-play module, the proposed method consistently improves BYOL and
SimSiam by 1.70\% $\sim$ 6.69\% classification accuracy on four publicly
available food recognition datasets. Superior results have also been achieved
on downstream segmentation tasks, demonstrating the effectiveness of the
proposed method.
Related papers
- Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior
Understanding [12.509298933267221]
We introduce a two-stage Contrastive Learning with Text-Embeded framework for Facial behavior understanding.
The first stage is a weakly-supervised contrastive learning method that learns representations from positive-negative pairs constructed using coarse-grained activity information.
The second stage aims to train the recognition of facial expressions or facial action units by maximizing the similarity between image and the corresponding text label names.
arXiv Detail & Related papers (2023-03-31T18:21:09Z) - Unsupervised Feature Clustering Improves Contrastive Representation
Learning for Medical Image Segmentation [18.75543045234889]
Self-supervised instance discrimination is an effective contrastive pretext task to learn feature representations and address limited medical image annotations.
We propose a new self-supervised contrastive learning method that uses unsupervised feature clustering to better select positive and negative image samples.
Our method outperforms state-of-the-art self-supervised contrastive techniques on these tasks.
arXiv Detail & Related papers (2022-11-15T22:54:29Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - ExCon: Explanation-driven Supervised Contrastive Learning for Image
Classification [12.109442912963969]
We propose to leverage saliency-based explanation methods to create content-preserving masked augmentations for contrastive learning.
Our novel explanation-driven supervised contrastive learning (ExCon) methodology critically serves the dual goals of encouraging nearby image embeddings to have similar content and explanation.
We demonstrate that ExCon outperforms vanilla supervised contrastive learning in terms of classification, explanation quality, adversarial robustness as well as calibration of probabilistic predictions of the model in the context of distributional shift.
arXiv Detail & Related papers (2021-11-28T23:15:26Z) - Focus on the Positives: Self-Supervised Learning for Biodiversity
Monitoring [9.086207853136054]
We address the problem of learning self-supervised representations from unlabeled image collections.
We exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images.
For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision.
arXiv Detail & Related papers (2021-08-14T01:12:41Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Understanding Adversarial Examples from the Mutual Influence of Images
and Perturbations [83.60161052867534]
We analyze adversarial examples by disentangling the clean images and adversarial perturbations, and analyze their influence on each other.
Our results suggest a new perspective towards the relationship between images and universal perturbations.
We are the first to achieve the challenging task of a targeted universal attack without utilizing original training data.
arXiv Detail & Related papers (2020-07-13T05:00:09Z) - Unsupervised Landmark Learning from Unpaired Data [117.81440795184587]
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses.
We propose a cross-image cycle consistency framework which applies the swapping-reconstruction strategy twice to obtain the final supervision.
Our proposed framework is shown to outperform strong baselines by a large margin.
arXiv Detail & Related papers (2020-06-29T13:57:20Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.