Artificial Eye for the Blind
- URL: http://arxiv.org/abs/2308.00801v1
- Date: Fri, 7 Jul 2023 10:00:50 GMT
- Title: Artificial Eye for the Blind
- Authors: Abhinav Benagi, Dhanyatha Narayan, Charith Rage, A Sushmitha
- Abstract summary: The main backbone of our Artificial Eye model is the Raspberry pi3 which is connected to the webcam.
We also run all our software models i.e object detection, Optical Character recognition, google text to speech conversion and the Mycroft voice assistance model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main backbone of our Artificial Eye model is the Raspberry pi3 which is
connected to the webcam ,ultrasonic proximity sensor, speaker and we also run
all our software models i.e object detection, Optical Character recognition,
google text to speech conversion and the Mycroft voice assistance model. At
first the ultrasonic proximity sensor will be measuring the distance between
itself and any obstacle in front of it .When the Proximity sensor detects any
obstacle in front within its specified range, the blind person will hear an
audio prompt about an obstacle in his way at a certain distance. At this time
the Webcam will capture an image in front of it and the Object detection model
and the Optical Character Recognition model will begin to run on the Raspberry
pi. The imat of the blind person. The text and the object detected are conveyed
to the blind pege captured is first sent through the Tesseract OCR module to
detect any texts in the image and then through the Object detection model to
detect the objects in fronrson by converting the texts to speech by using the
gTTS module. Along with the above mentioned process going on there will be an
active MYCROFT voice assistant model which can be used to interact with the
blind person. The blind person can ask about the weather , daily news , any
information on the internet ,etc
Related papers
- A Surveillance Based Interactive Robot [0.0]
We build a mobile surveillance robot that streams video in real time and responds to speech so a user can monitor and steer it from a phone or browser.<n>The system uses two Raspberry Pi 4 units: a front unit on a differential drive base with camera, mic, and speaker, and a central unit that serves the live feed and runs perception.<n>For voice interaction, we use Python libraries for speech recognition, multilingual translation, and text-to-speech.
arXiv Detail & Related papers (2025-08-18T19:09:43Z) - ChatAnything: Facetime Chat with LLM-Enhanced Personas [87.76804680223003]
We propose the mixture of voices (MoV) and the mixture of diffusers (MoD) for diverse voice and appearance generation.
For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones.
MoD, we combine the recent popular text-to-image generation techniques and talking head algorithms to streamline the process of generating talking objects.
arXiv Detail & Related papers (2023-11-12T08:29:41Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Object Recognition System on a Tactile Device for Visually Impaired [1.2891210250935146]
The device will convert visual information into auditory feedback, enabling users to understand their environment in a way that suits their sensory needs.
When the device is touched at a specific position, it provides an audio signal that communicates the identification of the object present in the scene at that corresponding position to the visually impaired individual.
arXiv Detail & Related papers (2023-07-05T11:37:17Z) - Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - Open-Vocabulary Point-Cloud Object Detection without 3D Annotation [62.18197846270103]
The goal of open-vocabulary 3D point-cloud detection is to identify novel objects based on arbitrary textual descriptions.
We develop a point-cloud detector that can learn a general representation for localizing various objects.
We also propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text.
arXiv Detail & Related papers (2023-04-03T08:22:02Z) - Detecting Human-Object Contact in Images [75.35017308643471]
Humans constantly contact objects to move and perform tasks.
There exists no robust method to detect contact between the body and the scene from an image.
We build a new dataset of human-object contacts for images.
arXiv Detail & Related papers (2023-03-06T18:56:26Z) - Detect Only What You Specify : Object Detection with Linguistic Target [0.0]
We propose Language-Targeted Detector (LTD) for the targeted detection based on a recently proposed Transformer-based detector.
LTD is a encoder-decoder architecture and our conditional decoder allows the model to reason about the encoded image with the textual input as the linguistic context.
arXiv Detail & Related papers (2022-11-18T07:28:47Z) - SANIP: Shopping Assistant and Navigation for the visually impaired [0.0]
The proposed model consists of three python models i.e. Custom Object Detection, Text Detection and Barcode detection.
For object detection of the hand held object, we have created our own custom dataset that comprises daily goods such as Parle-G, Tide, and Lays.
For the other 2 models proposed the text and barcode information retrieved is converted from text to speech and relayed to the Blind person.
arXiv Detail & Related papers (2022-09-08T05:35:03Z) - Open-Vocabulary DETR with Conditional Matching [86.1530128487077]
OV-DETR is an open-vocabulary detector based on DETR.
It can detect any object given its class name or an exemplar image.
It achieves non-trivial improvements over current state of the arts.
arXiv Detail & Related papers (2022-03-22T16:54:52Z) - Deep Sensory Substitution: Noninvasively Enabling Biological Neural
Networks to Receive Input from Artificial Neural Networks [5.478764356647437]
This work describes a novel technique for leveraging machine-learned feature embeddings to sonify visual information into a perceptual audio domain.
A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset.
In human subject tests, users were able to accurately classify audio sonifications of faces.
arXiv Detail & Related papers (2020-05-27T11:41:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.