Localizing Interpretable Multi-scale informative Patches Derived from
Media Classification Task
- URL: http://arxiv.org/abs/2002.03737v2
- Date: Fri, 17 Apr 2020 08:14:52 GMT
- Title: Localizing Interpretable Multi-scale informative Patches Derived from
Media Classification Task
- Authors: Chuanguang Yang, Zhulin An, Xiaolong Hu, Hui Zhu, Yongjun Xu
- Abstract summary: We construct an interpretable AnchorNet equipped with our carefully designed RFs and linearly spatial aggregation.
We show that localized patches can indeed retain the most semantics and evidences of the original inputs.
- Score: 12.447143226347922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep convolutional neural networks (CNN) always depend on wider receptive
field (RF) and more complex non-linearity to achieve state-of-the-art
performance, while suffering the increased difficult to interpret how relevant
patches contribute the final prediction. In this paper, we construct an
interpretable AnchorNet equipped with our carefully designed RFs and linearly
spatial aggregation to provide patch-wise interpretability of the input media
meanwhile localizing multi-scale informative patches only supervised on
media-level labels without any extra bounding box annotations. Visualization of
localized informative image and text patches show the superior multi-scale
localization capability of AnchorNet. We further use localized patches for
downstream classification tasks across widely applied networks. Experimental
results demonstrate that replacing the original inputs with their patches for
classification can get a clear inference acceleration with only tiny
performance degradation, which proves that localized patches can indeed retain
the most semantics and evidences of the original inputs.
Related papers
- DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - Localizing Semantic Patches for Accelerating Image Classification [12.250230630124758]
We first pinpoint task-aware regions over the input image by a lightweight patch proposal network called AnchorNet.
We then feed these localized semantic patches with much smaller spatial redundancy into a general classification network.
Our method outperforms SOTA dynamic inference methods with fewer inference costs.
arXiv Detail & Related papers (2022-06-07T15:01:54Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology
Image Analysis [78.78181964748144]
We present a novel weakly-supervised framework for classifying whole slide images (WSIs)
WSIs are commonly processed by patch-wise classification with patch-level labels.
With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label.
arXiv Detail & Related papers (2021-09-13T09:10:43Z) - Generalizing RNN-Transducer to Out-Domain Audio via Sparse
Self-Attention Layers [7.025709586759655]
Recurrent neural network transducers (RNN-T) are a promising end-to-end speech recognition framework.
The Conformer can effectively model the local-global context information via its convolution and self-attention layers.
The domain mismatch problem for Conformer RNN-T has not been intensively investigated yet.
arXiv Detail & Related papers (2021-08-22T08:06:15Z) - Fast and Accurate Normal Estimation for Point Cloud via Patch Stitching [12.559091712749279]
We present an effective normal estimation method adopting multi-patch stitching for an unstructured point cloud.
Our method achieves SOTA results with the advantage of lower computational costs and higher robustness to noise over most of the existing approaches.
arXiv Detail & Related papers (2021-03-30T04:30:35Z) - Context-aware Attentional Pooling (CAP) for Fine-grained Visual
Classification [2.963101656293054]
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition.
We propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients.
We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets.
arXiv Detail & Related papers (2021-01-17T10:15:02Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization.
Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization.
This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z) - Learning to segment from misaligned and partial labels [0.0]
Many non-urban settings lack the ground-truth needed for accurate segmentation.
Open source infrastructure annotations like OpenStreetMaps (OSM) are representative of this issue.
We present a novel and generalizable two-stage framework that enables improved pixel-wise image segmentation given misaligned and missing annotations.
arXiv Detail & Related papers (2020-05-27T06:02:58Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.