ViT Cane: Visual Assistant for the Visually Impaired
- URL: http://arxiv.org/abs/2109.13857v1
- Date: Sun, 26 Sep 2021 02:30:30 GMT
- Title: ViT Cane: Visual Assistant for the Visually Impaired
- Authors: Bhavesh Kumar
- Abstract summary: This paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time.
Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors.
Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Blind and visually challenged face multiple issues with navigating the world
independently. Some of these challenges include finding the shortest path to a
destination and detecting obstacles from a distance. To tackle this issue, this
paper proposes ViT Cane, which leverages a vision transformer model in order to
detect obstacles in real-time. Our entire system consists of a Pi Camera Module
v2, Raspberry Pi 4B with 8GB Ram and 4 motors. Based on tactile input using the
4 motors, the obstacle detection model is highly efficient in helping visually
impaired navigate unknown terrain and is designed to be easily reproduced. The
paper discusses the utility of a Visual Transformer model in comparison to
other CNN based models for this specific application. Through rigorous testing,
the proposed obstacle detection model has achieved higher performance on the
Common Object in Context (COCO) data set than its CNN counterpart.
Comprehensive field tests were conducted to verify the effectiveness of our
system for holistic indoor understanding and obstacle avoidance.
Related papers
- Turn-by-Turn Indoor Navigation for the Visually Impaired [0.0]
Navigating indoor environments presents significant challenges for visually impaired individuals.
This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera.
Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces.
arXiv Detail & Related papers (2024-10-25T20:16:38Z) - DVPE: Divided View Position Embedding for Multi-View 3D Object Detection [7.791229698270439]
Current research faces challenges in balancing between receptive fields and reducing interference when aggregating multi-view features.
This paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space.
Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.
arXiv Detail & Related papers (2024-07-24T02:44:41Z) - ODTFormer: Efficient Obstacle Detection and Tracking with Stereo Cameras Based on Transformer [12.58804521609764]
ODTFormer is a Transformer-based model to address both obstacle detection and tracking problems.
We report comparable accuracy to state-of-the-art obstacle tracking models while requiring only a fraction of their cost.
arXiv Detail & Related papers (2024-03-21T17:59:55Z) - FocalFormer3D : Focusing on Hard Instance for 3D Object Detection [97.56185033488168]
False negatives (FN) in 3D object detection can lead to potentially dangerous situations in autonomous driving.
In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies textitFN in a multi-stage manner.
We instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects.
arXiv Detail & Related papers (2023-08-08T20:06:12Z) - ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - 2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D
Object Detection [26.086623067939605]
In this report, we introduce a real-time method to detect the 2D objects from images.
We leverage accelerationRT to optimize the inference time of our detection pipeline.
Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
arXiv Detail & Related papers (2021-06-16T11:32:03Z) - Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K
Videos using a Joint Detection-and-Tracking Approach [19.59528430884104]
We present a neural network model called the Recurrent Correlational Network, where detection and tracking are jointly performed.
In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements.
Our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on a bird image dataset.
arXiv Detail & Related papers (2021-05-18T03:22:03Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training
Model [51.14840210957289]
Multi-object tracking is a fundamental vision problem that has been studied for a long time.
Despite the success of Tracking by Detection (TBD), this two-step method is too complicated to train in an end-to-end manner.
We propose a concise end-to-end model TubeTK which only needs one step training by introducing the bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
arXiv Detail & Related papers (2020-06-10T06:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.