Real-Time Onboard Object Detection for Augmented Reality: Enhancing
Head-Mounted Display with YOLOv8
- URL: http://arxiv.org/abs/2306.03537v1
- Date: Tue, 6 Jun 2023 09:35:45 GMT
- Title: Real-Time Onboard Object Detection for Augmented Reality: Enhancing
Head-Mounted Display with YOLOv8
- Authors: Miko{\l}aj {\L}ysakowski, Kamil \.Zywanowski, Adam Banaszczyk,
Micha{\l} R. Nowicki, Piotr Skrzypczy\'nski, S{\l}awomir K. Tadeja
- Abstract summary: This paper introduces a software architecture for real-time object detection using machine learning (ML) in an augmented reality (AR) environment.
We show the image processing pipeline for the YOLOv8 model and the techniques used to make it real-time on the resource-limited edge computing platform of the headset.
- Score: 2.1530718840070784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a software architecture for real-time object detection
using machine learning (ML) in an augmented reality (AR) environment. Our
approach uses the recent state-of-the-art YOLOv8 network that runs onboard on
the Microsoft HoloLens 2 head-mounted display (HMD). The primary motivation
behind this research is to enable the application of advanced ML models for
enhanced perception and situational awareness with a wearable, hands-free AR
platform. We show the image processing pipeline for the YOLOv8 model and the
techniques used to make it real-time on the resource-limited edge computing
platform of the headset. The experimental results demonstrate that our solution
achieves real-time processing without needing offloading tasks to the cloud or
any other external servers while retaining satisfactory accuracy regarding the
usual mAP metric and measured qualitative performance
Related papers
- What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector [0.0]
This study focuses on the YOLOv9 object detection model, focusing on its architectural innovations, training methodologies, and performance improvements.
Key advancements, such as the Generalized Efficient Layer Aggregation Network GELAN and Programmable Gradient Information PGI, significantly enhance feature extraction and gradient flow.
This paper provides the first in depth exploration of YOLOv9s internal features and their real world applicability, establishing it as a state of the art solution for real time object detection.
arXiv Detail & Related papers (2024-09-12T07:46:58Z) - Lightweight Object Detection: A Study Based on YOLOv7 Integrated with
ShuffleNetv2 and Vision Transformer [0.0]
This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms.
The experimental outcomes reveal that the refined YOLO model demonstrates exceptional performance.
arXiv Detail & Related papers (2024-03-04T05:29:32Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - DM-VTON: Distilled Mobile Real-time Virtual Try-On [16.35842298296878]
Distilled Mobile Real-time Virtual Try-On (DM-VTON) is a novel virtual try-on framework designed to achieve simplicity and efficiency.
We introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime.
Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU.
arXiv Detail & Related papers (2023-08-26T07:46:27Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time
Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS.
We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.
Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and
Hand Pose Estimation for Augmented Surgical Guidance [0.0]
We present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation.
We demonstrate state-of-the-art performance on a benchmark dataset for marker-less hand and surgical instrument pose tracking.
arXiv Detail & Related papers (2022-02-24T04:07:34Z) - Analysis of voxel-based 3D object detection methods efficiency for
real-time embedded systems [93.73198973454944]
Two popular voxel-based 3D object detection methods are studied in this paper.
Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances.
Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection.
arXiv Detail & Related papers (2021-05-21T12:40:59Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - A Markerless Deep Learning-based 6 Degrees of Freedom PoseEstimation for
with Mobile Robots using RGB Data [3.4806267677524896]
We propose a method to deploy state of the art neural networks for real time 3D object localization on augmented reality devices.
We focus on fast 2D detection approaches which are extracting the 3D pose of the object fast and accurately by using only 2D input.
For the 6D annotation of 2D images, we developed an annotation tool, which is, to our knowledge, the first open source tool to be available.
arXiv Detail & Related papers (2020-01-16T09:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.