StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
        - URL: http://arxiv.org/abs/2310.20666v1
- Date: Tue, 31 Oct 2023 17:30:57 GMT
- Title: StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
- Authors: Andrew Garrett Kurbis, Dmytro Kuzmenko, Bogdan Ivanyuk-Skulskiy, Alex
  Mihailidis, Brokoslaw Laschowski
- Abstract summary: StairNet is an initiative to support the development of new deep learning models for visual sensing and recognition of stairs.
We present an overview of the development of our large-scale dataset with over 515,000 manually labeled images.
We show that StairNet can be an effective platform to develop and study new visual perception systems for human-robot locomotion.
- Score: 2.3811618212533663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Human-robot walking with prosthetic legs and exoskeletons, especially over
complex terrains such as stairs, remains a significant challenge. Egocentric
vision has the unique potential to detect the walking environment prior to
physical interactions, which can improve transitions to and from stairs. This
motivated us to create the StairNet initiative to support the development of
new deep learning models for visual sensing and recognition of stairs, with an
emphasis on lightweight and efficient neural networks for onboard real-time
inference. In this study, we present an overview of the development of our
large-scale dataset with over 515,000 manually labeled images, as well as our
development of different deep learning models (e.g., 2D and 3D CNN, hybrid CNN
and LSTM, and ViT networks) and training methods (e.g., supervised learning
with temporal data and semi-supervised learning with unlabeled images) using
our new dataset. We consistently achieved high classification accuracy (i.e.,
up to 98.8%) with different designs, offering trade-offs between model accuracy
and size. When deployed on mobile devices with GPU and NPU accelerators, our
deep learning models achieved inference speeds up to 2.8 ms. We also deployed
our models on custom-designed CPU-powered smart glasses. However, limitations
in the embedded hardware yielded slower inference speeds of 1.5 seconds,
presenting a trade-off between human-centered design and performance. Overall,
we showed that StairNet can be an effective platform to develop and study new
visual perception systems for human-robot locomotion with applications in
exoskeleton and prosthetic leg control.
 
      
        Related papers
        - Being-H0: Vision-Language-Action Pretraining from Large-Scale Human   Videos [66.62109400603394]
 We introduce Being-H0, a dexterous Vision-Language-Action model trained on large-scale human videos.<n>Our approach centers on physical instruction tuning, a novel training paradigm that combines large-scale VLA pretraining from human videos, physical space alignment for 3D reasoning, and post-training adaptation for robotic tasks.<n>We empirically show the excellence of Being-H0 in hand motion generation and instruction following, and it also scales well with model and data sizes.
 arXiv  Detail & Related papers  (2025-07-21T13:19:09Z)
- Deep Learning for Human Locomotion Analysis in Lower-Limb Exoskeletons:   A Comparative Study [1.3569491184708433]
 This paper presents an experimental comparison between eight deep neural network backbones to predict high-level locomotion parameters.
The LSTM achieved high terrain classification accuracy (0.94 +- 0.04) and precise ramp slope (1.95 +- 0.58deg) and the CNN-LSTM a stair height (15.65 +- 7.40 mm)
The system operates with 2 ms inference time, supporting real-time applications.
 arXiv  Detail & Related papers  (2025-03-21T07:12:44Z)
- A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
 Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear.
We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck.
Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
 arXiv  Detail & Related papers  (2025-03-10T06:18:31Z)
- Neural feels with neural fields: Visuo-tactile perception for in-hand
  manipulation [57.60490773016364]
 We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
 arXiv  Detail & Related papers  (2023-12-20T22:36:37Z)
- Efficient Adaptive Human-Object Interaction Detection with
  Concept-guided Memory [64.11870454160614]
 We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
 arXiv  Detail & Related papers  (2023-09-07T13:10:06Z)
- StairNetV3: Depth-aware Stair Modeling using Deep Learning [6.145334325463317]
 Vision-based stair perception can help autonomous mobile robots deal with the challenge of climbing stairs.
Current monocular vision methods are difficult to model stairs accurately without depth information.
This paper proposes a depth-aware stair modeling method for monocular vision.
 arXiv  Detail & Related papers  (2023-08-13T08:11:40Z)
- Baby Physical Safety Monitoring in Smart Home Using Action Recognition
  System [0.0]
 We present a novel framework combining transfer learning techniques with a Conv2D LSTM layer to extract features from the pre-trained I3D model on the Kinetics dataset.
We developed a benchmark dataset and an automated model that uses LSTM convolution with I3D (ConvLSTM-I3D) for recognizing and predicting baby activities in a smart baby room.
 arXiv  Detail & Related papers  (2022-10-22T19:00:14Z)
- Masked World Models for Visual Control [90.13638482124567]
 We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
 arXiv  Detail & Related papers  (2022-06-28T18:42:27Z)
- ProFormer: Learning Data-efficient Representations of Body Movement with
  Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
 Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
 arXiv  Detail & Related papers  (2022-02-23T11:11:54Z)
- A Review of Deep Learning Techniques for Markerless Human Motion on
  Synthetic Datasets [0.0]
 Estimating human posture has recently gained increasing attention in the computer vision community.
We present a model that can predict the skeleton of an animation based solely on 2D images.
The implementation process uses DeepLabCut on its own dataset to perform many necessary steps.
 arXiv  Detail & Related papers  (2022-01-07T15:42:50Z)
- Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
  Observations [75.60524561611008]
 This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
 arXiv  Detail & Related papers  (2021-09-28T20:25:10Z)
- STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
 This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
 Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
 arXiv  Detail & Related papers  (2021-07-15T02:53:11Z)
- Where is my hand? Deep hand segmentation for visual self-recognition in
  humanoid robots [129.46920552019247]
 We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
 arXiv  Detail & Related papers  (2021-02-09T10:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.