Related papers: A Real-Time Diminished Reality Approach to Privacy in MR Collaboration

A Real-Time Diminished Reality Approach to Privacy in MR Collaboration

URL: http://arxiv.org/abs/2509.10466v1
Date: Thu, 21 Aug 2025 04:01:56 GMT
Title: A Real-Time Diminished Reality Approach to Privacy in MR Collaboration
Authors: Christian Fane,
Abstract summary: This thesis presents a real-time, inpainting-based DR system designed to enable privacy control in mixed reality meetings.<n>The system allows a primary headset user to selectively remove personal or sensitive items from their environment.<n>At 720p resolution, the pipeline sustains frame rates exceeding 20 fps, demonstrating the feasibility of real-time diminished reality for practical privacy-preserving MR applications.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diminished reality (DR) refers to the digital removal of real-world objects by compositing background content in their place. This thesis presents a real-time, inpainting-based DR system designed to enable privacy control in shared-space mixed reality (MR) meetings. The system allows a primary headset user to selectively remove personal or sensitive items from their environment, ensuring that those objects are no longer visible to other participants. Removal is achieved through semantic segmentation and precise object selection, followed by real-time inpainting from the viewpoint of a secondary observer, implemented using a mobile ZED 2i depth camera. The solution is designed to be portable and robust, requiring neither a fixed secondary viewpoint nor prior 3D scanning of the environment. The system utilises YOLOv11 for object detection and a modified Decoupled Spatial-Temporal Transformer (DSTT) model for high-quality video inpainting. At 720p resolution, the pipeline sustains frame rates exceeding 20 fps, demonstrating the feasibility of real-time diminished reality for practical privacy-preserving MR applications.

Related papers

UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models [54.564740558030245]
We present UCM, a novel framework that unifies long-term memory and precise camera control via a time-aware positional encoding warping mechanism.<n>We also introduce a scalable data curation strategy utilizing point-cloud-based rendering to simulate scene revisiting.
arXiv Detail & Related papers (2026-02-26T12:54:46Z)
OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects [58.38338242973447]
OnlineSplatter is a novel framework generating high-quality, object-centric 3D Gaussians directly from RGB frames.<n>Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field.<n>Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys.
arXiv Detail & Related papers (2025-10-23T14:37:25Z)
MObI: Multimodal Object Inpainting Using Diffusion Models [52.07640413626605]
This paper introduces MObI, a novel framework for Multimodal Object Inpainting.<n>Using a single reference RGB image, MObI enables objects to be seamlessly inserted into existing multimodal scenes.<n>Unlike traditional inpainting methods that rely solely on edit masks, our 3D bounding box conditioning gives objects accurate spatial positioning and realistic scaling.
arXiv Detail & Related papers (2025-01-06T17:43:26Z)
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision [75.38953287579616]
We present a novel framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction. We tackle two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion. Experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin.
arXiv Detail & Related papers (2023-10-18T03:57:06Z)
SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker. SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z)
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals. We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z)
BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo [6.5401888641091634]
temporal multi-view stereo (MVS) technology is the natural knowledge for tackling this ambiguity. By introducing a dynamic temporal stereo strategy, BEVStereo++ is able to cut down the harm that is brought by introducing temporal stereo. BEVStereo++ achieves state-of-the-art(SOTA) on both dataset and nuScenes.
arXiv Detail & Related papers (2023-04-09T08:04:26Z)
A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices [3.4836209951879957]
We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side. Our system supports flexible frame rates up to 120 FPS and guarantees high precision and real-time tracking on low-end devices.
arXiv Detail & Related papers (2022-10-22T15:26:50Z)
Realtime 3D Object Detection for Headsets [19.096803385184174]
We propose DeepMix, a mobility-aware, lightweight, and hybrid3D object detection framework. DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios.
arXiv Detail & Related papers (2022-01-15T05:50:18Z)
Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos. Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation. For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z)
MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion [19.034317851914725]
We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video.
arXiv Detail & Related papers (2020-04-09T02:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.