RGB-D SLAM Using Attention Guided Frame Association
- URL: http://arxiv.org/abs/2201.12047v1
- Date: Fri, 28 Jan 2022 11:23:29 GMT
- Title: RGB-D SLAM Using Attention Guided Frame Association
- Authors: Ali Caglayan, Nevrez Imamoglu, Oguzhan Guclu, Ali Osman Serhatoglu,
Weimin Wang, Ahmet Burak Can, Ryosuke Nakamura
- Abstract summary: We propose the use of task specific network attention for RGB-D indoor SLAM.
We integrate layer-wise object attention information (layer gradients) with CNN layer representations to improve frame association performance.
Experiments show promising initial results with improved performance.
- Score: 11.484398586420067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models as an emerging topic have shown great progress in
various fields. Especially, visualization tools such as class activation
mapping methods provided visual explanation on the reasoning of convolutional
neural networks (CNNs). By using the gradients of the network layers, it is
possible to demonstrate where the networks pay attention during a specific
image recognition task. Moreover, these gradients can be integrated with CNN
features for localizing more generalized task dependent attentive (salient)
objects in scenes. Despite this progress, there is not much explicit usage of
this gradient (network attention) information to integrate with CNN
representations for object semantics. This can be very useful for visual tasks
such as simultaneous localization and mapping (SLAM) where CNN representations
of spatially attentive object locations may lead to improved performance.
Therefore, in this work, we propose the use of task specific network attention
for RGB-D indoor SLAM. To do so, we integrate layer-wise object attention
information (layer gradients) with CNN layer representations to improve frame
association performance in a state-of-the-art RGB-D indoor SLAM method.
Experiments show promising initial results with improved performance.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Point-SLAM: Dense Neural Point Cloud-based SLAM [61.96492935210654]
We propose a dense neural simultaneous localization and mapping (SLAM) approach for monocular RGBD input.
We demonstrate that both tracking and mapping can be performed with the same point-based neural scene representation.
arXiv Detail & Related papers (2023-04-09T16:48:26Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - How Convolutional Neural Network Architecture Biases Learned Opponency
and Colour Tuning [1.0742675209112622]
Recent work suggests that changing Convolutional Neural Network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function.
To understand this relationship fully requires a way of quantitatively comparing trained networks.
We propose an approach to obtaining spatial and colour tuning curves for convolutional neurons.
arXiv Detail & Related papers (2020-10-06T11:33:48Z) - Decoding CNN based Object Classifier Using Visualization [6.666597301197889]
We visualize what type of features are extracted in different convolution layers of CNN.
Visualizing heat map of activation helps us to understand how CNN classifies and localizes different objects in image.
arXiv Detail & Related papers (2020-07-15T05:01:27Z) - Embedded Encoder-Decoder in Convolutional Networks Towards Explainable
AI [0.0]
This paper proposes a new explainable convolutional neural network (XCNN) which represents important and driving visual features of stimuli.
The experimental results on the CIFAR-10, Tiny ImageNet, and MNIST datasets showed the success of our algorithm (XCNN) to make CNNs explainable.
arXiv Detail & Related papers (2020-06-19T15:49:39Z) - When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D
Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks.
To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed.
Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.