Blind Users Accessing Their Training Images in Teachable Object
Recognizers
- URL: http://arxiv.org/abs/2208.07968v1
- Date: Tue, 16 Aug 2022 21:59:48 GMT
- Title: Blind Users Accessing Their Training Images in Teachable Object
Recognizers
- Authors: Jonggi Hong, Jaina Gandhi, Ernest Essuah Mensah, Ebrima H Jarjue,
Kyungjun Lee, Hernisa Kacorri
- Abstract summary: MyCam is a mobile app that incorporates automatically estimated descriptors for non-visual access to the photos in the users' training sets.
We demonstrate that the real-time photo-level descriptors enabled blind users to reduce photos with cropped objects, and that participants could add more variations by iterating through and accessing the quality of their training sets.
- Score: 12.833745050235047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Iteration of training and evaluating a machine learning model is an important
process to improve its performance. However, while teachable interfaces enable
blind users to train and test an object recognizer with photos taken in their
distinctive environment, accessibility of training iteration and evaluation
steps has received little attention. Iteration assumes visual inspection of the
training photos, which is inaccessible for blind users. We explore this
challenge through MyCam, a mobile app that incorporates automatically estimated
descriptors for non-visual access to the photos in the users' training sets. We
explore how blind participants (N=12) interact with MyCam and the descriptors
through an evaluation study in their homes. We demonstrate that the real-time
photo-level descriptors enabled blind users to reduce photos with cropped
objects, and that participants could add more variations by iterating through
and accessing the quality of their training sets. Also, Participants found the
app simple to use indicating that they could effectively train it and that the
descriptors were useful. However, subjective responses were not reflected in
the performance of their models, partially due to little variation in training
and cluttered backgrounds.
Related papers
- Empowering Visually Impaired Individuals: A Novel Use of Apple Live
Photos and Android Motion Photos [3.66237529322911]
We advocate for the use of Apple Live Photos and Android Motion Photos technologies.
Our findings reveal that both Live Photos and Motion Photos outperform single-frame images in common visual assisting tasks.
arXiv Detail & Related papers (2023-09-14T20:46:35Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - ImaginaryNet: Learning Object Detectors without Real Images and
Annotations [66.30908705345973]
We propose a framework to synthesize images by combining pretrained language model and text-to-image model.
With the synthesized images and class labels, weakly supervised object detection can then be leveraged to accomplish Imaginary-Supervised Object Detection.
Experiments show that ImaginaryNet can (i) obtain about 70% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data.
arXiv Detail & Related papers (2022-10-13T10:25:22Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking [137.26381337333552]
In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data.
Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
arXiv Detail & Related papers (2021-06-21T07:40:34Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding.
We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment.
We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Learning to Visually Navigate in Photorealistic Environments Without any
Supervision [37.22924101745505]
We introduce a novel approach for learning to navigate from image inputs without external supervision or reward.
Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals.
We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.
arXiv Detail & Related papers (2020-04-10T08:59:32Z) - Crowdsourcing the Perception of Machine Teaching [17.94519906313517]
Teachable interfaces can empower end-users to attune machine learning systems to their idiosyncratic characteristics and environment.
While facilitating control, their effectiveness can be hindered by the lack of expertise or misconceptions.
We investigate how users may conceptualize, experience, and reflect on their engagement in machine teaching by deploying a mobile teachable testbed in Amazon Mechanical Turk.
arXiv Detail & Related papers (2020-02-05T03:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.