Semantic and structural image segmentation for prosthetic vision
- URL: http://arxiv.org/abs/1809.09607v3
- Date: Tue, 28 Jan 2025 22:51:28 GMT
- Title: Semantic and structural image segmentation for prosthetic vision
- Authors: Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jose J. Guerrero,
- Abstract summary: The ability of object recognition and scene understanding in real environments is severely restricted for prosthetic users.
We present a new approach to build a schematic representation of indoor environments for phosphene images.
The proposed method combines a variety of convolutional neural networks for extracting and conveying relevant information.
- Score: 2.048226951354646
- License:
- Abstract: Prosthetic vision is being applied to partially recover the retinal stimulation of visually impaired people. However, the phosphenic images produced by the implants have very limited information bandwidth due to the poor resolution and lack of color or contrast. The ability of object recognition and scene understanding in real environments is severely restricted for prosthetic users. Computer vision can play a key role to overcome the limitations and to optimize the visual information in the simulated prosthetic vision, improving the amount of information that is presented. We present a new approach to build a schematic representation of indoor environments for phosphene images. The proposed method combines a variety of convolutional neural networks for extracting and conveying relevant information about the scene such as structural informative edges of the environment and silhouettes of segmented objects. Experiments were conducted with normal sighted subjects with a Simulated Prosthetic Vision system. The results show good accuracy for object recognition and room identification tasks for indoor scenes using the proposed approach, compared to other image processing methods.
Related papers
- Influence of field of view in visual prostheses design: Analysis with a VR system [3.9998518782208783]
We evaluate the influence of field of view with respect to spatial resolution in visual prostheses.
Twenty-four normally sighted participants were asked to find and recognize usual objects.
Results show that the accuracy and response time decrease when the field of view is increased.
arXiv Detail & Related papers (2025-01-28T22:25:22Z) - Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection [54.26588902144298]
We propose a knowledge-guided prompt learning method for deepfake facial image detection.
Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts.
Our proposed approach notably outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-01T02:18:18Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Learning How To Robustly Estimate Camera Pose in Endoscopic Videos [5.073761189475753]
We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation.
Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content.
We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS.
arXiv Detail & Related papers (2023-04-17T07:05:01Z) - Neural Radiance Transfer Fields for Relightable Novel-view Synthesis
with Global Illumination [63.992213016011235]
We propose a method for scene relighting under novel views by learning a neural precomputed radiance transfer function.
Our method can be solely supervised on a set of real images of the scene under a single unknown lighting condition.
Results show that the recovered disentanglement of scene parameters improves significantly over the current state of the art.
arXiv Detail & Related papers (2022-07-27T16:07:48Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - Assessing visual acuity in visual prostheses through a virtual-reality
system [7.529227133770206]
Current visual implants still provide very low resolution and limited field of view, thus limiting visual acuity in implanted patients.
We take advantage of virtual-reality software paired with a portable head-mounted display to evaluate the performance of normally sighted participants under simulated prosthetic vision.
Our results showed that of all conditions tested, a field of view of 20deg and 1000 phosphenes of resolution proved the best, with a visual acuity of 1.3 logMAR.
arXiv Detail & Related papers (2022-05-20T18:24:15Z) - Compositional Scene Representation Learning via Reconstruction: A Survey [48.33349317481124]
Compositional scene representation learning is a task that enables such abilities.
Deep neural networks have been proven to be advantageous in representation learning.
Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation.
arXiv Detail & Related papers (2022-02-15T02:14:05Z) - Deep Learning--Based Scene Simplification for Bionic Vision [0.0]
We show that object segmentation may better support scene understanding than models based on visual saliency and monocular depth estimation.
This work has the potential to drastically improve the utility of prosthetic vision for people blinded from retinal degenerative diseases.
arXiv Detail & Related papers (2021-01-30T19:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.