Neglected Free Lunch -- Learning Image Classifiers Using Annotation
Byproducts
- URL: http://arxiv.org/abs/2303.17595v3
- Date: Wed, 26 Jul 2023 11:06:32 GMT
- Title: Neglected Free Lunch -- Learning Image Classifiers Using Annotation
Byproducts
- Authors: Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung,
Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh
- Abstract summary: Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y)
We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure.
We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB)
- Score: 43.76258241948858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised learning of image classifiers distills human knowledge into a
parametric model through pairs of images and corresponding labels (X,Y). We
argue that this simple and widely used representation of human knowledge
neglects rich auxiliary information from the annotation procedure, such as the
time-series of mouse traces and clicks left after image selection. Our insight
is that such annotation byproducts Z provide approximate human attention that
weakly guides the model to focus on the foreground cues, reducing spurious
correlations and discouraging shortcut learning. To verify this, we create
ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with
sample-wise annotation byproducts, collected by replicating the respective
original annotation tasks. We refer to the new paradigm of training models with
annotation byproducts as learning using annotation byproducts (LUAB). We show
that a simple multitask loss for regressing Z together with Y already improves
the generalisability and robustness of the learned models. Compared to the
original supervised learning, LUAB does not require extra annotation costs.
ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.
Related papers
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Unpaired Image Captioning by Image-level Weakly-Supervised Visual
Concept Recognition [83.93422034664184]
Unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase.
Most existing studies use off-the-shelf algorithms to obtain the visual concepts.
We propose a novel approach to achieve cost-effective UIC using image-level labels.
arXiv Detail & Related papers (2022-03-07T08:02:23Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.