Object Recognition System on a Tactile Device for Visually Impaired
- URL: http://arxiv.org/abs/2307.02211v1
- Date: Wed, 5 Jul 2023 11:37:17 GMT
- Title: Object Recognition System on a Tactile Device for Visually Impaired
- Authors: Souayah Abdelkader, Mokretar Kraroubi Abderrahmene, Slimane Larabi
- Abstract summary: The device will convert visual information into auditory feedback, enabling users to understand their environment in a way that suits their sensory needs.
When the device is touched at a specific position, it provides an audio signal that communicates the identification of the object present in the scene at that corresponding position to the visually impaired individual.
- Score: 1.2891210250935146
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: People with visual impairments face numerous challenges when interacting with
their environment. Our objective is to develop a device that facilitates
communication between individuals with visual impairments and their
surroundings. The device will convert visual information into auditory
feedback, enabling users to understand their environment in a way that suits
their sensory needs. Initially, an object detection model is selected from
existing machine learning models based on its accuracy and cost considerations,
including time and power consumption. The chosen model is then implemented on a
Raspberry Pi, which is connected to a specifically designed tactile device.
When the device is touched at a specific position, it provides an audio signal
that communicates the identification of the object present in the scene at that
corresponding position to the visually impaired individual. Conducted tests
have demonstrated the effectiveness of this device in scene understanding,
encompassing static or dynamic objects, as well as screen contents such as TVs,
computers, and mobile phones.
Related papers
- You Only Speak Once to See [24.889319740761827]
Grounding objects in images using visual cues is a well-established approach in computer vision.
We introduce YOSS, "You Only Speak Once to See," to leverage audio for grounding objects in visual scenes.
Experimental results indicate that audio guidance can be effectively applied to object grounding.
arXiv Detail & Related papers (2024-09-27T01:16:15Z) - Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning [9.178588671620963]
This work aims to recognise the latent unobservable object characteristics.
vision is commonly used for object recognition by robots, but it is ineffective for detecting hidden objects.
We propose a cross-modal transfer learning approach from vision to haptic-audio.
arXiv Detail & Related papers (2024-03-15T21:18:14Z) - Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks.
vision-based tactile sensors are being widely used for various robotic perception and control tasks.
We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z) - Touch and Go: Learning from Human-Collected Vision and Touch [16.139106833276]
We propose a dataset with paired visual and tactile data called Touch and Go.
Human data collectors probe objects in natural environments using tactile sensors.
Our dataset spans a large number of "in the wild" objects and scenes.
arXiv Detail & Related papers (2022-11-22T18:59:32Z) - Weakly Supervised Human-Object Interaction Detection in Video via
Contrastive Spatiotemporal Regions [81.88294320397826]
A system does not know what human-object interactions are present in a video as or the actual location of the human and object.
We introduce a dataset comprising over 6.5k videos with human-object interaction that have been curated from sentence captions.
We demonstrate improved performance over weakly supervised baselines adapted to our annotations on our video dataset.
arXiv Detail & Related papers (2021-10-07T15:30:18Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Gaze-contingent decoding of human navigation intention on an autonomous
wheelchair platform [6.646253877148766]
We have pioneered the Where-You-Look-Is Where-You-Go approach to controlling mobility platforms.
We present a new solution, consisting of 1. deep computer vision to understand what object a user is looking at in their field of view.
Our decoding system ultimately determines whether the user wants to drive to e.g., a door or just looks at it.
arXiv Detail & Related papers (2021-03-04T14:52:06Z) - Learning Intuitive Physics with Multimodal Generative Models [24.342994226226786]
This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes.
We use a novel See-Through-your-Skin (STS) sensor that provides high resolution multimodal sensing of contact surfaces.
We validate through simulated and real-world experiments in which the resting state of an object is predicted from given initial conditions.
arXiv Detail & Related papers (2021-01-12T12:55:53Z) - Self-Supervised Learning of Audio-Visual Objects from Video [108.77341357556668]
We introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time.
We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks.
arXiv Detail & Related papers (2020-08-10T16:18:01Z) - COBE: Contextualized Object Embeddings from Narrated Instructional Video [52.73710465010274]
We propose a new framework for learning Contextualized OBject Embeddings from automatically-transcribed narrations of instructional videos.
We leverage the semantic and compositional structure of language by training a visual detector to predict a contextualized word embedding of the object and its associated narration.
Our experiments show that our detector learns to predict a rich variety of contextual object information, and that it is highly effective in the settings of few-shot and zero-shot learning.
arXiv Detail & Related papers (2020-07-14T19:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.