Related papers: Real-Time Pill Identification for the Visually Impaired Using Deep Learning

Real-Time Pill Identification for the Visually Impaired Using Deep Learning

URL: http://arxiv.org/abs/2405.05983v1
Date: Wed, 8 May 2024 03:18:46 GMT
Title: Real-Time Pill Identification for the Visually Impaired Using Deep Learning
Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu,
Abstract summary: This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. The application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices.
Score: 31.747327310138314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices. The system incorporates Text-to- Speech (TTS) to provide immediate auditory feedback, enhancing usability and independence for visually impaired users. Our study evaluates the application's effectiveness in terms of detection accuracy and user experience, highlighting its potential to improve medication management and safety among the visually impaired community. Keywords-Deep Learning; YOLO Framework; Mobile Application; Visual Impairment; Pill Identification; Healthcare

Related papers

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users [42.132487737233845]
This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies.
arXiv Detail & Related papers (2025-03-28T16:54:25Z)
AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models [0.0]
This paper introduces a novel wearable vision assistance system with artificial intelligence (AI) technology to deliver real-time feedback to a user through a sound beep mechanism. The system provides detailed descriptions of objects in the user's environment using a large vision language model (LVLM)
arXiv Detail & Related papers (2024-12-28T07:26:39Z)
Small Object Detection for Indoor Assistance to the Blind using YOLO NAS Small and Super Gradients [0.0]
This paper presents a novel approach for indoor assistance to the blind by addressing the challenge of small object detection. We propose a technique YOLO NAS Small architecture, a lightweight and efficient object detection model, optimized using the Super Gradients training framework.
arXiv Detail & Related papers (2024-08-28T05:38:20Z)
Understanding How Blind Users Handle Object Recognition Errors: Strategies and Challenges [10.565823004989817]
This paper presents a study aimed at understanding blind users' interaction with object recognition systems for identifying and avoiding errors. We conducted a user study involving 12 blind and low-vision participants. We gained insights into users' experiences, challenges, and strategies for identifying errors in camera-based assistive technologies and object recognition systems.
arXiv Detail & Related papers (2024-08-06T17:09:56Z)
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development [12.258245804049114]
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues. We present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids.
arXiv Detail & Related papers (2024-05-24T17:58:42Z)
Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption. SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning. We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z)
Improve accessibility for Low Vision and Blind people using Machine Learning and Computer Vision [0.0]
This project explores how machine learning and computer vision could be utilized to improve accessibility for people with visual impairments. This project will concentrate on building a mobile application that helps blind people to orient in space by receiving audio and haptic feedback.
arXiv Detail & Related papers (2024-03-24T21:19:17Z)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z)
Empowering Visually Impaired Individuals: A Novel Use of Apple Live Photos and Android Motion Photos [3.66237529322911]
We advocate for the use of Apple Live Photos and Android Motion Photos technologies. Our findings reveal that both Live Photos and Motion Photos outperform single-frame images in common visual assisting tasks.
arXiv Detail & Related papers (2023-09-14T20:46:35Z)
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives. First, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z)
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference. Specifically, we analyze the effect of the learned visual attention on network prediction. We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z)
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. One promising data source to help monitor human behavior is daily smartphone usage. We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.