DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication
for Real-World Displays
- URL: http://arxiv.org/abs/2105.05092v1
- Date: Tue, 11 May 2021 14:44:12 GMT
- Title: DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication
for Real-World Displays
- Authors: Vu Tran, Gihan Jayatilaka, Ashwin Ashok, Archan Misra
- Abstract summary: DeepLight is a system that incorporates machine learning (ML) models in the decoding pipeline to achieve humanly-imperceptible, moderately high SCC rates.
DeepLight's key innovation is the design of a Deep Neural Network (DNN) based decoder that collectively decodes all the bits spatially encoded in a display frame.
- Score: 4.632704227272501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper introduces a novel, holistic approach for robust Screen-Camera
Communication (SCC), where video content on a screen is visually encoded in a
human-imperceptible fashion and decoded by a camera capturing images of such
screen content. We first show that state-of-the-art SCC techniques have two key
limitations for in-the-wild deployment: (a) the decoding accuracy drops rapidly
under even modest screen extraction errors from the captured images, and (b)
they generate perceptible flickers on common refresh rate screens even with
minimal modulation of pixel intensity. To overcome these challenges, we
introduce DeepLight, a system that incorporates machine learning (ML) models in
the decoding pipeline to achieve humanly-imperceptible, moderately high SCC
rates under diverse real-world conditions. Deep-Light's key innovation is the
design of a Deep Neural Network (DNN) based decoder that collectively decodes
all the bits spatially encoded in a display frame, without attempting to
precisely isolate the pixels associated with each encoded bit. In addition,
DeepLight supports imperceptible encoding by selectively modulating the
intensity of only the Blue channel, and provides reasonably accurate screen
extraction (IoU values >= 83%) by using state-of-the-art object detection DNN
pipelines. We show that a fully functional DeepLight system is able to robustly
achieve high decoding accuracy (frame error rate < 0.2) and moderately-high
data goodput (>=0.95Kbps) using a human-held smartphone camera, even over
larger screen-camera distances (approx =2m).
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - SGE: Structured Light System Based on Gray Code with an Event Camera [9.701291540219026]
We introduce Gray code into event-based structured light systems for the first time.
We show that our approach achieves accuracy comparable to state-of-the-art scanning methods.
Our proposed approach offers a highly promising solution for ultra-fast, real-time, and high-precision dense depth estimation.
arXiv Detail & Related papers (2024-03-12T05:20:44Z) - Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem.
Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position.
Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z) - A Novel Light Field Coding Scheme Based on Deep Belief Network &
Weighted Binary Images for Additive Layered Displays [0.30458514384586394]
Stacking light attenuating layers is one approach to implement a light field display with a broader depth of field, wide viewing angles and high resolution.
This paper proposes a novel framework for light field representation and coding that utilizes Deep Belief Network (DBN) and weighted binary images.
arXiv Detail & Related papers (2022-10-04T08:18:06Z) - Super-resolution image display using diffractive decoders [21.24387597787123]
High-resolution synthesis/projection of images over a large field-of-view (FOV) is hindered by the restricted space-bandwidth-product (SBP) of wavefront modulators.
We report a deep learning-enabled diffractive display design that is based on a jointly-trained pair of an electronic encoder and a diffractive optical decoder.
Our results indicate that this diffractive image display can achieve a super-resolution factor of 4, demonstrating a 16-fold increase in SBP.
arXiv Detail & Related papers (2022-06-15T03:42:36Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Dual-view Snapshot Compressive Imaging via Optical Flow Aided Recurrent
Neural Network [14.796204921975733]
Dual-view snapshot compressive imaging (SCI) aims to capture videos from two field-of-views (FoVs) in a single snapshot.
It is challenging for existing model-based decoding algorithms to reconstruct each individual scene.
We propose an optical flow-aided recurrent neural network for dual video SCI systems, which provides high-quality decoding in seconds.
arXiv Detail & Related papers (2021-09-11T14:24:44Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - ODE-CNN: Omnidirectional Depth Extension Networks [43.40308168978984]
We propose a low-cost 3D sensing system that combines an omnidirectional camera with a calibrated projective depth camera.
To accurately recover the missing depths, we design an omnidirectional depth extension convolutional neural network.
ODE-CNN significantly outperforms (relatively 33% reduction in-depth error) other state-of-the-art (SoTA) methods.
arXiv Detail & Related papers (2020-07-03T03:14:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.