Visual Recognition with Deep Nearest Centroids
- URL: http://arxiv.org/abs/2209.07383v1
- Date: Thu, 15 Sep 2022 15:47:31 GMT
- Title: Visual Recognition with Deep Nearest Centroids
- Authors: Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu
- Abstract summary: We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition.
Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes)
- Score: 57.35144702563746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We devise deep nearest centroids (DNC), a conceptually elegant yet
surprisingly effective network for large-scale visual recognition, by
revisiting Nearest Centroids, one of the most classic and simple classifiers.
Current deep models learn the classifier in a fully parametric manner, ignoring
the latent data structure and lacking simplicity and explainability. DNC
instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids
of training samples to describe class distributions and clearly explains the
classification as the proximity of test data and the class sub-centroids in the
feature space. Due to the distance-based nature, the network output
dimensionality is flexible, and all the learnable parameters are only for data
embedding. That means all the knowledge learnt for ImageNet classification can
be completely transferred for pixel recognition learning, under the
"pre-training and fine-tuning" paradigm. Apart from its nested simplicity and
intuitive decision-making mechanism, DNC can even possess ad-hoc explainability
when the sub-centroids are selected as actual training images that humans can
view and inspect. Compared with parametric counterparts, DNC performs better on
image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition
(ADE20K, Cityscapes), with improved transparency and fewer learnable
parameters, using various network architectures (ResNet, Swin) and segmentation
models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights
into related fields.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - CAR: Class-aware Regularizations for Semantic Segmentation [20.947897583427192]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
Our method can be easily applied to most existing segmentation models during training, including OCR and CPNet.
arXiv Detail & Related papers (2022-03-14T15:02:48Z) - Multiscale Convolutional Transformer with Center Mask Pretraining for
Hyperspectral Image Classificationtion [14.33259265286265]
We propose a noval multi-scale convolutional embedding module for hyperspectral images (HSI) to realize effective extraction of spatial-spectral information.
Similar to Mask autoencoder, but our pre-training method only masks the corresponding token of the central pixel in the encoder, and inputs the remaining token into the decoder to reconstruct the spectral information of the central pixel.
arXiv Detail & Related papers (2022-03-09T14:42:26Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models.
The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.