Related papers: MENTOR: Human Perception-Guided Pretraining for Increased Generalization

MENTOR: Human Perception-Guided Pretraining for Increased Generalization

URL: http://arxiv.org/abs/2310.19545v2
Date: Mon, 12 Feb 2024 17:04:46 GMT
Title: MENTOR: Human Perception-Guided Pretraining for Increased Generalization
Authors: Colton R. Crum, Adam Czajka
Abstract summary: We introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization) We train an autoencoder to learn human saliency maps given an input image, without class labels. We remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally.
Score: 5.596752018167751
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Incorporating human perception into training of convolutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks. One of the active research questions is where (in the model architecture) and how to efficiently incorporate always-limited human perceptual data into training strategies of models. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization), which addresses this question through two unique rounds of training the CNNs tasked with open-set anomaly detection. First, we train an autoencoder to learn human saliency maps given an input image, without class labels. The autoencoder is thus tasked with discovering domain-specific salient features which mimic human perception. Second, we remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally. We show that MENTOR's benefits are twofold: (a) significant accuracy boost in anomaly detection tasks (in this paper demonstrated for detection of unknown iris presentation attacks, synthetically-generated faces, and anomalies in chest X-ray images), compared to models utilizing conventional transfer learning (e.g., sourcing the weights from ImageNet-pretrained models) as well as to models trained with the state-of-the-art approach incorporating human perception guidance into loss functions, and (b) an increase in the efficiency of model training, requiring fewer epochs to converge compared to state-of-the-art training methods.

Related papers

Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks [3.858607108771203]
Existing saliency-guided training approaches improve model generalization by incorporating a loss term that compares the model's class activation map (CAM) against a human reference saliency map.<n>Prior work has ignored the false-class CAM(s), that is the model's saliency obtained for incorrect-label class.<n>We use this hypothesis to motivate three new saliency-guided training methods incorporating both true- and false-class model's CAM into the training strategy and a novel post-hoc tool for identifying important features.
arXiv Detail & Related papers (2025-07-22T20:17:08Z)
Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Training Better Deep Learning Models Using Human Saliency [11.295653130022156]
This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. We propose a new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions.
arXiv Detail & Related papers (2024-10-21T16:52:44Z)
Activate and Reject: Towards Safe Domain Generalization under Category Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS) It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition [35.34238362639678]
We propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition.
arXiv Detail & Related papers (2022-11-07T02:33:45Z)
ProFormer: Learning Data-efficient Representations of Body Movement with Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays. We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement. In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z)
Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder. Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream. The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning [5.092711491848192]
This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG) New training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures.
arXiv Detail & Related papers (2021-12-01T18:04:15Z)
Deepfake Forensics via An Adversarial Game [99.84099103679816]
We advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we propose a new adversarial training method that attempts to blur out these specific artifacts.
arXiv Detail & Related papers (2021-03-25T02:20:08Z)
A Neuro-Inspired Autoencoding Defense Against Adversarial Perturbations [11.334887948796611]
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks. Most effective current defense is to train the network using adversarially perturbed examples. In this paper, we investigate a radically different, neuro-inspired defense mechanism.
arXiv Detail & Related papers (2020-11-21T21:03:08Z)
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method [1.027974860479791]
We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
arXiv Detail & Related papers (2020-07-06T15:12:50Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.