A Multisensory Learning Architecture for Rotation-invariant Object
Recognition
- URL: http://arxiv.org/abs/2009.06292v1
- Date: Mon, 14 Sep 2020 09:39:48 GMT
- Title: A Multisensory Learning Architecture for Rotation-invariant Object
Recognition
- Authors: Murat Kirtay and Guido Schillaci and Verena V. Hafner
- Abstract summary: This study presents a multisensory machine learning architecture for object recognition by employing a novel dataset that was constructed with the iCub robot.
The proposed architecture combines convolutional neural networks to form representations (i.e., features) ford grayscale color images and a multi-layer perceptron algorithm to process depth data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study presents a multisensory machine learning architecture for object
recognition by employing a novel dataset that was constructed with the iCub
robot, which is equipped with three cameras and a depth sensor. The proposed
architecture combines convolutional neural networks to form representations
(i.e., features) for grayscaled color images and a multi-layer perceptron
algorithm to process depth data. To this end, we aimed to learn joint
representations of different modalities (e.g., color and depth) and employ them
for recognizing objects. We evaluate the performance of the proposed
architecture by benchmarking the results obtained with the models trained
separately with the input of different sensors and a state-of-the-art data
fusion technique, namely decision level fusion. The results show that our
architecture improves the recognition accuracy compared with the models that
use inputs from a single modality and decision level multimodal fusion method.
Related papers
- EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [54.99121380536659]
Eye movement biometrics have received increasing attention thanks to its high secure identification.
Deep learning (DL) models have been recently successfully applied for eye movement recognition.
DL architecture still is determined by human prior knowledge.
We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - Multi-Objective Neural Architecture Search for In-Memory Computing [0.5892638927736115]
We employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing architectures.
Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets.
arXiv Detail & Related papers (2024-06-10T19:17:09Z) - The Impact of Different Backbone Architecture on Autonomous Vehicle
Dataset [120.08736654413637]
The quality of the features extracted by the backbone architecture can have a significant impact on the overall detection performance.
Our study evaluates three well-known autonomous vehicle datasets, namely KITTI, NuScenes, and BDD, to compare the performance of different backbone architectures on object detection tasks.
arXiv Detail & Related papers (2023-09-15T17:32:15Z) - Scene Change Detection Using Multiscale Cascade Residual Convolutional
Neural Networks [0.0]
Scene change detection is an image processing problem related to partitioning pixels of a digital image into foreground and background regions.
In this work, we propose a novel Multiscale Residual Processing Module, with a Convolutional Neural Network that integrates a Residual Processing Module.
Experiments conducted on two different datasets support the overall effectiveness of the proposed approach, achieving an average overall effectiveness of $boldsymbol0.9622$ and $boldsymbol0.9664$ over Change Detection 2014 and PetrobrasROUTES datasets respectively.
arXiv Detail & Related papers (2022-12-20T16:48:51Z) - Super-Resolution and Image Re-projection for Iris Recognition [67.42500312968455]
Convolutional Neural Networks (CNNs) using different deep learning approaches attempt to recover realistic texture and fine grained details from low resolution images.
In this work we explore the viability of these approaches for iris Super-Resolution (SR) in an iris recognition environment.
Results show that CNNs and image re-projection can improve the results specially for the accuracy of recognition systems.
arXiv Detail & Related papers (2022-10-20T09:46:23Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Deep Texture-Aware Features for Camouflaged Object Detection [69.84122372541506]
This paper formulates texture-aware refinement modules to learn the texture-aware features in a deep convolutional neural network.
We evaluate our network on the benchmark dataset for camouflaged object detection both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-02-05T04:38:32Z) - Auto-MVCNN: Neural Architecture Search for Multi-view 3D Shape
Recognition [16.13826056628379]
In 3D shape recognition, multi-view based methods leverage human's perspective to analyze 3D shapes and have achieved significant outcomes.
We propose a neural architecture search method named Auto-MVCNN which is particularly designed for optimizing architecture in multi-view 3D shape recognition.
arXiv Detail & Related papers (2020-12-10T07:40:28Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning [24.13425816781179]
Local feature extraction remains an active research area due to the advances in fields such as SLAM, 3D reconstructions, or AR applications.
We propose a method that treats both extractions independently and focuses on their interaction in the learning process.
We show improvements over the state of the art in terms of image matching on HPatches and 3D reconstruction quality while keeping on par on camera localisation tasks.
arXiv Detail & Related papers (2020-05-12T13:55:04Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.