Teaching AI to Teach: Leveraging Limited Human Salience Data Into
Unlimited Saliency-Based Training
- URL: http://arxiv.org/abs/2306.05527v2
- Date: Thu, 9 Nov 2023 18:15:05 GMT
- Title: Teaching AI to Teach: Leveraging Limited Human Salience Data Into
Unlimited Saliency-Based Training
- Authors: Colton R. Crum, Aidan Boyd, Kevin Bowyer, Adam Czajka
- Abstract summary: We use "teacher" models (trained on a small amount of human-annotated data) to annotate additional data by means of teacher models' saliency maps.
Then, "student" models are trained using the larger amount of annotated training data.
We compare the accuracy achieved by our teacher-student training paradigm with (1) training using all available human salience annotations, and (2) using all available training data without human salience annotations.
- Score: 6.038173052593495
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning models have shown increased accuracy in classification tasks
when the training process incorporates human perceptual information. However, a
challenge in training human-guided models is the cost associated with
collecting image annotations for human salience. Collecting annotation data for
all images in a large training set can be prohibitively expensive. In this
work, we utilize "teacher" models (trained on a small amount of human-annotated
data) to annotate additional data by means of teacher models' saliency maps.
Then, "student" models are trained using the larger amount of annotated
training data. This approach makes it possible to supplement a limited number
of human-supplied annotations with an arbitrarily large number of
model-generated image annotations. We compare the accuracy achieved by our
teacher-student training paradigm with (1) training using all available human
salience annotations, and (2) using all available training data without human
salience annotations. We use synthetic face detection and fake iris detection
as example challenging problems, and report results across four model
architectures (DenseNet, ResNet, Xception, and Inception), and two saliency
estimation methods (CAM and RISE). Results show that our teacher-student
training paradigm results in models that significantly exceed the performance
of both baselines, demonstrating that our approach can usefully leverage a
small amount of human annotations to generate salience maps for an arbitrary
amount of additional training data.
Related papers
- EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models [4.215251065887862]
Human visual saliency can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions.
Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity.
In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques.
arXiv Detail & Related papers (2024-05-01T17:27:11Z) - Self-Training and Multi-Task Learning for Limited Data: Evaluation Study
on Object Detection [4.9914667450658925]
Experimental results show the improvement of performance when using a weak teacher with unseen data for training a multi-task student.
Despite the limited setup we believe the experimental results show the potential of multi-task knowledge distillation and self-training.
arXiv Detail & Related papers (2023-09-12T14:50:14Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z) - Learning from Imperfect Annotations [15.306536555936692]
Many machine learning systems today are trained on large amounts of human-annotated data.
We propose a new end-to-end framework that enables us to merge the aggregation step with model training.
We show accuracy gains of up to 25% over the current state-of-the-art approaches for aggregating annotations.
arXiv Detail & Related papers (2020-04-07T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.