Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene
Classification
- URL: http://arxiv.org/abs/2004.04491v1
- Date: Thu, 9 Apr 2020 11:24:00 GMT
- Title: Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene
Classification
- Authors: S. Wang, Y. Guan, L. Shao
- Abstract summary: We propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to automatically capture the latent ontological structure of remote sensing datasets.
For each specific granularity, we discover the canonical appearance from a set of pre-defined transformations and learn the corresponding CNN features through a maxout-based Siamese style architecture.
We provide a stable solution for training the eigenvalue-decomposition function (EIG) in a GPU and demonstrate the corresponding back-propagation using matrix calculus.
- Score: 0.34376560669160383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognising remote sensing scene images remains challenging due to large
visual-semantic discrepancies. These mainly arise due to the lack of detailed
annotations that can be employed to align pixel-level representations with
high-level semantic labels. As the tagging process is labour-intensive and
subjective, we hereby propose a novel Multi-Granularity Canonical Appearance
Pooling (MG-CAP) to automatically capture the latent ontological structure of
remote sensing datasets. We design a granular framework that allows
progressively cropping the input image to learn multi-grained features. For
each specific granularity, we discover the canonical appearance from a set of
pre-defined transformations and learn the corresponding CNN features through a
maxout-based Siamese style architecture. Then, we replace the standard CNN
features with Gaussian covariance matrices and adopt the proper matrix
normalisations for improving the discriminative power of features. Besides, we
provide a stable solution for training the eigenvalue-decomposition function
(EIG) in a GPU and demonstrate the corresponding back-propagation using matrix
calculus. Extensive experiments have shown that our framework can achieve
promising results in public remote sensing scene datasets.
Related papers
- A Spitting Image: Modular Superpixel Tokenization in Vision Transformers [0.0]
Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image.
We propose a modular superpixel tokenization strategy which decouples tokenization and feature extraction.
arXiv Detail & Related papers (2024-08-14T17:28:58Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Bayesian Deep Learning for Affordance Segmentation in images [3.15834651147911]
We present a novel Bayesian deep network to detect affordances in images.
We quantify the distribution of the aleatoric and epistemic variance at the spatial level.
Our results outperform the state-of-the-art of deterministic networks.
arXiv Detail & Related papers (2023-03-02T00:01:13Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.