ARPOV: Expanding Visualization of Object Detection in AR with Panoramic Mosaic Stitching
- URL: http://arxiv.org/abs/2410.01055v1
- Date: Tue, 1 Oct 2024 20:29:14 GMT
- Title: ARPOV: Expanding Visualization of Object Detection in AR with Panoramic Mosaic Stitching
- Authors: Erin McGowan, Ethan Brewer, Claudio Silva,
- Abstract summary: ARPOV is an interactive visual analytics tool for analyzing object detection model outputs tailored to video captured by an AR headset.
The proposed tool leverages panorama stitching to expand the view of the environment while automatically filtering undesirable frames.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the uses of augmented reality (AR) become more complex and widely available, AR applications will increasingly incorporate intelligent features that require developers to understand the user's behavior and surrounding environment (e.g. an intelligent assistant). Such applications rely on video captured by an AR headset, which often contains disjointed camera movement with a limited field of view that cannot capture the full scope of what the user sees at any given time. Moreover, standard methods of visualizing object detection model outputs are limited to capturing objects within a single frame and timestep, and therefore fail to capture the temporal and spatial context that is often necessary for various domain applications. We propose ARPOV, an interactive visual analytics tool for analyzing object detection model outputs tailored to video captured by an AR headset that maximizes user understanding of model performance. The proposed tool leverages panorama stitching to expand the view of the environment while automatically filtering undesirable frames, and includes interactive features that facilitate object detection model debugging. ARPOV was designed as part of a collaboration between visualization researchers and machine learning and AR experts; we validate our design choices through interviews with 5 domain experts.
Related papers
- Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs [5.891295920078768]
We introduce an advanced approach for fine-grained object visual key field detection.
First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images.
Next, we use Vision Studio to extract semantic object descriptions.
Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map.
arXiv Detail & Related papers (2024-04-01T14:53:36Z) - The Impact of Different Backbone Architecture on Autonomous Vehicle
Dataset [120.08736654413637]
The quality of the features extracted by the backbone architecture can have a significant impact on the overall detection performance.
Our study evaluates three well-known autonomous vehicle datasets, namely KITTI, NuScenes, and BDD, to compare the performance of different backbone architectures on object detection tasks.
arXiv Detail & Related papers (2023-09-15T17:32:15Z) - Evaluation of Environmental Conditions on Object Detection using
Oriented Bounding Boxes for AR Applications [7.274773183842099]
Scene analysis and object recognition play a crucial role in augmented reality (AR)
New approach is proposed that involves using oriented bounding boxes with a detection and recognition deep network to improve performance and processing time.
Results indicate that the proposed approach tends to produce better Average Precision and greater accuracy for small objects in most of the tested conditions.
arXiv Detail & Related papers (2023-06-29T09:17:58Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Teachable Reality: Prototyping Tangible Augmented Reality with Everyday
Objects by Leveraging Interactive Machine Teaching [4.019017835137353]
Teachable Reality is an augmented reality (AR) prototyping tool for creating interactive tangible AR applications with arbitrary everyday objects.
It identifies the user-defined tangible and gestural interactions using an on-demand computer vision model.
Our approach can lower the barrier to creating functional AR prototypes while also allowing flexible and general-purpose prototyping experiences.
arXiv Detail & Related papers (2023-02-21T23:03:49Z) - Interactive Segmentation and Visualization for Tiny Objects in
Multi-megapixel Images [5.09193568605539]
We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects in large multi-megapixel high-range images.
We developed an interactive toolkit that unifies inference model, HDR image visualization, segmentation mask inspection and editing into a single graphical user interface.
Our interface features mouse-controlled, synchronized, dual-window visualization of the image and the segmentation mask, a critical feature for locating tiny objects in multi-megapixel images.
arXiv Detail & Related papers (2022-04-21T18:26:48Z) - Multi-modal Transformers Excel at Class-agnostic Object Detection [105.10403103027306]
We argue that existing methods lack a top-down supervision signal governed by human-understandable semantics.
We develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention.
We show the significance of MViT proposals in a diverse range of applications.
arXiv Detail & Related papers (2021-11-22T18:59:29Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.
In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD)
We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z) - Object Detection in the Context of Mobile Augmented Reality [16.49070406578342]
We propose a novel approach that combines the geometric information from VIO with semantic information from object detectors to improve the performance of object detection on mobile devices.
Our approach includes three components: (1) an image orientation correction method, (2) a scale-based filtering approach, and (3) an online semantic map.
The results show that our approach can improve on the accuracy of generic object detectors by 12% on our dataset.
arXiv Detail & Related papers (2020-08-15T05:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.