Towards Context-Agnostic Learning Using Synthetic Data
- URL: http://arxiv.org/abs/2005.14707v3
- Date: Tue, 7 Dec 2021 01:52:01 GMT
- Title: Towards Context-Agnostic Learning Using Synthetic Data
- Authors: Charles Jin, Martin Rinard
- Abstract summary: We propose a novel setting for learning, where the input domain is the image of a map defined on the product of two sets, one of which completely determines the labels.
We derive a new risk bound for this setting that decomposes into a bias and an error term, and exhibits a surprisingly weak dependence on the true labels.
Inspired by these results, we present an algorithm aimed at minimizing the bias term by exploiting the ability to sample from each set independently.
- Score: 5.892876463573452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel setting for learning, where the input domain is the image
of a map defined on the product of two sets, one of which completely determines
the labels. We derive a new risk bound for this setting that decomposes into a
bias and an error term, and exhibits a surprisingly weak dependence on the true
labels. Inspired by these results, we present an algorithm aimed at minimizing
the bias term by exploiting the ability to sample from each set independently.
We apply our setting to visual classification tasks, where our approach enables
us to train classifiers on datasets that consist entirely of a single synthetic
example of each class. On several standard benchmarks for real-world image
classification, we achieve robust performance in the context-agnostic setting,
with good generalization to real world domains, whereas training directly on
real world data without our techniques yields classifiers that are brittle to
perturbations of the background.
Related papers
- Dynamic Loss For Robust Learning [17.33444812274523]
This work presents a novel meta-learning based dynamic loss that automatically adjusts the objective functions with the training process to robustly learn a classifier from long-tailed noisy data.
Our method achieves state-of-the-art accuracy on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision.
arXiv Detail & Related papers (2022-11-22T01:48:25Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Generalized Category Discovery [148.32255950504182]
We consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set.
Here, the unlabelled images may come from labelled classes or from novel ones.
We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task.
We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes.
arXiv Detail & Related papers (2022-01-07T18:58:35Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Ensemble Learning with Manifold-Based Data Splitting for Noisy Label
Correction [20.401661156102897]
noisy labels in training data can significantly degrade a model's generalization performance.
We propose an ensemble learning method to correct noisy labels by exploiting the local structures of feature manifold.
Our experiments on real-world noisy label datasets demonstrate the superiority of the proposed method over existing state-of-the-arts.
arXiv Detail & Related papers (2021-03-13T07:24:58Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.