Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR
- URL: http://arxiv.org/abs/2407.04362v1
- Date: Fri, 5 Jul 2024 09:03:52 GMT
- Title: Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR
- Authors: Shogo Morita, Yan Zhang, Takuto Yamauchi, Sinan Chen, Jialong Li, Kenji Tei,
- Abstract summary: People with color vision deficiency often face challenges in distinguishing colors such as red and green.
Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings.
This paper proposes an application that provides contextual and autonomous assistance.
- Score: 2.4560886170097573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: People with color vision deficiency often face challenges in distinguishing colors such as red and green, which can complicate daily tasks and require the use of assistive tools or environmental adjustments. Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings. However, offering context-aware support, like indicating the doneness of meat, remains a challenge since task-specific solutions are not cost-effective for all possible scenarios. To address this, our paper proposes an application that provides contextual and autonomous assistance. This application is mainly composed of: (i) an augmented reality interface that efficiently captures context; and (ii) a multi-modal large language model-based reasoner that serves to cognitize the context and then reason about the appropriate support contents. Preliminary user experiments with two color vision deficient users across five different scenarios have demonstrated the effectiveness and universality of our application.
Related papers
- StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond [68.0107158115377]
We have crafted an efficient vision-language model, StrucTexTv3, tailored to tackle various intelligent tasks for text-rich images.
We enhance the perception and comprehension abilities of StrucTexTv3 through instruction learning.
Our method achieved SOTA results in text-rich image perception tasks, and significantly improved performance in comprehension tasks.
arXiv Detail & Related papers (2024-05-31T16:55:04Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - Improving In-Context Learning in Diffusion Models with Visual
Context-Modulated Prompts [83.03471704115786]
We introduce improved Prompt Diffusion (iPromptDiff) in this study.
iPromptDiff integrates an end-to-end trained vision encoder that converts visual context into an embedding vector.
We show that a diffusion-based vision foundation model, when equipped with this visual context-modulated text guidance and a standard ControlNet structure, exhibits versatility and robustness across a variety of training tasks.
arXiv Detail & Related papers (2023-12-03T14:15:52Z) - Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [55.65727739645824]
Chat-UniVi is a Unified Vision-language model capable of comprehending and engaging in conversations involving images and videos.
We employ a set of dynamic visual tokens to uniformly represent images and videos.
We leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details.
arXiv Detail & Related papers (2023-11-14T10:11:36Z) - Unifying Image Processing as Visual Prompting Question Answering [62.84955983910612]
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications.
Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise.
We propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks.
arXiv Detail & Related papers (2023-10-16T15:32:57Z) - Supplementing Missing Visions via Dialog for Scene Graph Generations [14.714122626081064]
We investigate a computer vision task setting with incomplete visual input data.
We propose to supplement the missing visions via the natural language dialog interactions to better accomplish the task objective.
We demonstrate the feasibility of such a task setting with missing visual input and the effectiveness of our proposed dialog module as the supplementary information source.
arXiv Detail & Related papers (2022-04-23T21:46:17Z) - Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task
Feasibility in Interactive Visual Environments [54.405920619915655]
We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a dataset with natural language commands for the greatest number of interactive environments to date.
MoTIF is the first to contain natural language requests for interactive environments that are not satisfiable.
We perform initial feasibility classification experiments and only reach an F1 score of 37.3, verifying the need for richer vision-language representations.
arXiv Detail & Related papers (2021-04-17T14:48:02Z) - Personalizing image enhancement for critical visual tasks: improved
legibility of papyri using color processing and visual illusions [0.0]
Methods: Novel enhancement algorithms based on color processing and visual illusions are compared to classic methods in a user experience experiment.
Users exhibited a broad behavioral spectrum, under the influence of factors such as personality and social conditioning, tasks and application domains, expertise level and image quality, and affordances of software, hardware, and interfaces.
arXiv Detail & Related papers (2021-03-11T23:48:17Z) - VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement
Learning [14.553086325168803]
We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment)
We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment.
The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal.
arXiv Detail & Related papers (2020-10-26T18:51:02Z) - Real-time single image depth perception in the wild with handheld
devices [45.26484111468387]
Two main issues limit depth estimation from handheld devices in-the-wild.
We show how they are both addressable adopting appropriate network design and training strategies.
We report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild.
arXiv Detail & Related papers (2020-06-10T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.