Related papers: CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

URL: http://arxiv.org/abs/2112.00686v1
Date: Wed, 1 Dec 2021 18:04:15 GMT
Title: CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning
Authors: Aidan Boyd, Patrick Tinsley, Kevin Bowyer, Adam Czajka
Abstract summary: This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG) New training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures.
Score: 5.092711491848192
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose model saliency with human saliency, and penalize the model for large differences. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures. We also show that scaling to even seven times as much training data with standard loss cannot beat the accuracy of CYBORG loss. As a side effect, we observed that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions. All data, code and pre-trained models used in this work are offered with this paper.

Related papers

Training Better Deep Learning Models Using Human Saliency [11.295653130022156]
This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. We propose a new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions.
arXiv Detail & Related papers (2024-10-21T16:52:44Z)
Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models [0.0]
We introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue. The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics.
arXiv Detail & Related papers (2024-03-13T10:51:38Z)
Appearance Debiased Gaze Estimation via Stochastic Subject-Wise Adversarial Learning [33.55397868171977]
Appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. We propose a novel framework: subject-wise gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively.
arXiv Detail & Related papers (2024-01-25T00:23:21Z)
A Coefficient Makes SVRG Effective [51.36251650664215]
Variance Reduced Gradient (SVRG) is a theoretically compelling optimization method. In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks.
arXiv Detail & Related papers (2023-11-09T18:47:44Z)
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture. We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z)
MENTOR: Human Perception-Guided Pretraining for Increased Generalization [5.596752018167751]
We introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization) We train an autoencoder to learn human saliency maps given an input image, without class labels. We remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally.
arXiv Detail & Related papers (2023-10-30T13:50:44Z)
Reducing Training Demands for 3D Gait Recognition with Deep Koopman Operator Constraints [8.382355998881879]
We introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait. We also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.
arXiv Detail & Related papers (2023-08-14T21:39:33Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face Learning [54.13876727413492]
In many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID. With the non-uniform increase of samples, such issue is converted to a more general case, a.k.a a long-tail face learning. Based on the Semi-Siamese Training (SST), we introduce an advanced solution, named Multi-Agent Semi-Siamese Training (MASST) MASST includes a probe network and multiple gallery agents, the former aims to encode the probe features, and the latter constitutes a stack of
arXiv Detail & Related papers (2021-05-10T04:57:32Z)
The FaceChannel: A Fast & Furious Deep Neural Network for Facial Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train. We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.