On Training Sketch Recognizers for New Domains
- URL: http://arxiv.org/abs/2104.08850v1
- Date: Sun, 18 Apr 2021 13:24:49 GMT
- Title: On Training Sketch Recognizers for New Domains
- Authors: Kemal Tugrul Yesilbek, T. Metin Sezgin
- Abstract summary: We show that the ecological validity of the data collection protocol and the ability to accommodate small datasets are significant factors impacting recognizer accuracy in realistic scenarios.
We demonstrate that in realistic scenarios where data is scarce and expensive, standard measures taken for adapting deep learners to small datasets fall short of comparing favorably with alternatives.
- Score: 3.8149289266694466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sketch recognition algorithms are engineered and evaluated using publicly
available datasets contributed by the sketch recognition community over the
years. While existing datasets contain sketches of a limited set of generic
objects, each new domain inevitably requires collecting new data for training
domain specific recognizers. This gives rise to two fundamental concerns:
First, will the data collection protocol yield ecologically valid data? Second,
will the amount of collected data suffice to train sufficiently accurate
classifiers? In this paper, we draw attention to these two concerns. We show
that the ecological validity of the data collection protocol and the ability to
accommodate small datasets are significant factors impacting recognizer
accuracy in realistic scenarios. More specifically, using sketch-based gaming
as a use case, we show that deep learning methods, as well as more traditional
methods, suffer significantly from dataset shift. Furthermore, we demonstrate
that in realistic scenarios where data is scarce and expensive, standard
measures taken for adapting deep learners to small datasets fall short of
comparing favorably with alternatives. Although transfer learning, and
extensive data augmentation help deep learners, they still perform
significantly worse compared to standard setups (e.g., SVMs and GBMs with
standard feature representations). We pose learning from small datasets as a
key problem for the deep sketch recognition field, one which has been ignored
in the bulk of the existing literature.
Related papers
- Combining Public Human Activity Recognition Datasets to Mitigate Labeled
Data Scarcity [1.274578243851308]
We propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model.
Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples.
arXiv Detail & Related papers (2023-06-23T18:51:22Z) - Localized Shortcut Removal [4.511561231517167]
High performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful.
This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand.
We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images.
arXiv Detail & Related papers (2022-11-24T13:05:33Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Dominant Set-based Active Learning for Text Classification and its
Application to Online Social Media [0.0]
We present a novel pool-based active learning method for the training of large unlabeled corpus with minimum annotation cost.
Our proposed method does not have any parameters to be tuned, making it dataset-independent.
Our method achieves a higher performance in comparison to the state-of-the-art active learning strategies.
arXiv Detail & Related papers (2022-01-28T19:19:03Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.