PICASO: Permutation-Invariant Cascaded Attentional Set Operator
- URL: http://arxiv.org/abs/2107.08305v1
- Date: Sat, 17 Jul 2021 19:21:30 GMT
- Title: PICASO: Permutation-Invariant Cascaded Attentional Set Operator
- Authors: Samira Zare, Hien Van Nguyen
- Abstract summary: We propose a permutation-invariant cascaded attentional set operator (PICASO) for set-input deep networks.
The proposed operator is a stand-alone module that can be adapted and extended to serve different machine learning tasks.
We demonstrate the utilities of PICASO in four diverse scenarios: (i) clustering, (ii) image classification under novel viewpoints, (iii) image anomaly detection, and (iv) state prediction.
- Score: 6.845913709297514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Set-input deep networks have recently drawn much interest in computer vision
and machine learning. This is in part due to the increasing number of important
tasks such as meta-learning, clustering, and anomaly detection that are defined
on set inputs. These networks must take an arbitrary number of input samples
and produce the output invariant to the input set permutation. Several
algorithms have been recently developed to address this urgent need. Our paper
analyzes these algorithms using both synthetic and real-world datasets, and
shows that they are not effective in dealing with common data variations such
as image translation or viewpoint change. To address this limitation, we
propose a permutation-invariant cascaded attentional set operator (PICASO). The
gist of PICASO is a cascade of multihead attention blocks with dynamic
templates. The proposed operator is a stand-alone module that can be adapted
and extended to serve different machine learning tasks. We demonstrate the
utilities of PICASO in four diverse scenarios: (i) clustering, (ii) image
classification under novel viewpoints, (iii) image anomaly detection, and (iv)
state prediction. PICASO increases the SmallNORB image classification accuracy
with novel viewpoints by about 10% points. For set anomaly detection on CelebA
dataset, our model improves the areas under ROC and PR curves dataset by about
22% and 10%, respectively. For the state prediction on CLEVR dataset, it
improves the AP by about 40%.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Hierarchical Convolutional Neural Network with Feature Preservation and
Autotuned Thresholding for Crack Detection [5.735035463793008]
Drone imagery is increasingly used in automated inspection for infrastructure surface defects.
This paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation.
The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements.
arXiv Detail & Related papers (2021-04-21T13:07:58Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.
This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management.
We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Learn to Predict Sets Using Feed-Forward Neural Networks [63.91494644881925]
This paper addresses the task of set prediction using deep feed-forward neural networks.
We present a novel approach for learning to predict sets with unknown permutation and cardinality.
We demonstrate the validity of our set formulations on relevant vision problems.
arXiv Detail & Related papers (2020-01-30T01:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.