StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
- URL: http://arxiv.org/abs/2310.20666v1
- Date: Tue, 31 Oct 2023 17:30:57 GMT
- Title: StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
- Authors: Andrew Garrett Kurbis, Dmytro Kuzmenko, Bogdan Ivanyuk-Skulskiy, Alex
Mihailidis, Brokoslaw Laschowski
- Abstract summary: StairNet is an initiative to support the development of new deep learning models for visual sensing and recognition of stairs.
We present an overview of the development of our large-scale dataset with over 515,000 manually labeled images.
We show that StairNet can be an effective platform to develop and study new visual perception systems for human-robot locomotion.
- Score: 2.3811618212533663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-robot walking with prosthetic legs and exoskeletons, especially over
complex terrains such as stairs, remains a significant challenge. Egocentric
vision has the unique potential to detect the walking environment prior to
physical interactions, which can improve transitions to and from stairs. This
motivated us to create the StairNet initiative to support the development of
new deep learning models for visual sensing and recognition of stairs, with an
emphasis on lightweight and efficient neural networks for onboard real-time
inference. In this study, we present an overview of the development of our
large-scale dataset with over 515,000 manually labeled images, as well as our
development of different deep learning models (e.g., 2D and 3D CNN, hybrid CNN
and LSTM, and ViT networks) and training methods (e.g., supervised learning
with temporal data and semi-supervised learning with unlabeled images) using
our new dataset. We consistently achieved high classification accuracy (i.e.,
up to 98.8%) with different designs, offering trade-offs between model accuracy
and size. When deployed on mobile devices with GPU and NPU accelerators, our
deep learning models achieved inference speeds up to 2.8 ms. We also deployed
our models on custom-designed CPU-powered smart glasses. However, limitations
in the embedded hardware yielded slower inference speeds of 1.5 seconds,
presenting a trade-off between human-centered design and performance. Overall,
we showed that StairNet can be an effective platform to develop and study new
visual perception systems for human-robot locomotion with applications in
exoskeleton and prosthetic leg control.
Related papers
- Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z) - StairNetV3: Depth-aware Stair Modeling using Deep Learning [6.145334325463317]
Vision-based stair perception can help autonomous mobile robots deal with the challenge of climbing stairs.
Current monocular vision methods are difficult to model stairs accurately without depth information.
This paper proposes a depth-aware stair modeling method for monocular vision.
arXiv Detail & Related papers (2023-08-13T08:11:40Z) - Baby Physical Safety Monitoring in Smart Home Using Action Recognition
System [0.0]
We present a novel framework combining transfer learning techniques with a Conv2D LSTM layer to extract features from the pre-trained I3D model on the Kinetics dataset.
We developed a benchmark dataset and an automated model that uses LSTM convolution with I3D (ConvLSTM-I3D) for recognizing and predicting baby activities in a smart baby room.
arXiv Detail & Related papers (2022-10-22T19:00:14Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - A Review of Deep Learning Techniques for Markerless Human Motion on
Synthetic Datasets [0.0]
Estimating human posture has recently gained increasing attention in the computer vision community.
We present a model that can predict the skeleton of an animation based solely on 2D images.
The implementation process uses DeepLabCut on its own dataset to perform many necessary steps.
arXiv Detail & Related papers (2022-01-07T15:42:50Z) - Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.