Private Eye: On the Limits of Textual Screen Peeking via Eyeglass
Reflections in Video Conferencing
- URL: http://arxiv.org/abs/2205.03971v1
- Date: Sun, 8 May 2022 23:29:13 GMT
- Title: Private Eye: On the Limits of Textual Screen Peeking via Eyeglass
Reflections in Video Conferencing
- Authors: Yan Long, Chen Yan, Shivan Prasad, Wenyuan Xu, Kevin Fu
- Abstract summary: Video leaks participants' on-screen information because eyeglasses and other reflective objects unwittingly expose partial screen contents.
Using mathematical modeling and human subjects experiments, this research explores the extent to which emerging webcams might leak recognizable textual information.
Our work explores and characterizes the viable threat models based on optical attacks using multi-frame super resolution techniques on sequences of video frames.
- Score: 18.84055230013228
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Personal video conferencing has become the new norm after COVID-19 caused a
seismic shift from in-person meetings and phone calls to video conferencing for
daily communications and sensitive business. Video leaks participants'
on-screen information because eyeglasses and other reflective objects
unwittingly expose partial screen contents. Using mathematical modeling and
human subjects experiments, this research explores the extent to which emerging
webcams might leak recognizable textual information gleamed from eyeglass
reflections captured by webcams. The primary goal of our work is to measure,
compute, and predict the factors, limits, and thresholds of recognizability as
webcam technology evolves in the future. Our work explores and characterizes
the viable threat models based on optical attacks using multi-frame super
resolution techniques on sequences of video frames. Our experimental results
and models show it is possible to reconstruct and recognize on-screen text with
a height as small as 10 mm with a 720p webcam. We further apply this threat
model to web textual content with varying attacker capabilities to find
thresholds at which text becomes recognizable. Our user study with 20
participants suggests present-day 720p webcams are sufficient for adversaries
to reconstruct textual content on big-font websites. Our models further show
that the evolution toward 4K cameras will tip the threshold of text leakage to
reconstruction of most header texts on popular websites. Our research proposes
near-term mitigations, and justifies the importance of following the principle
of least privilege for long-term defense against this attack. For
privacy-sensitive scenarios, it's further recommended to develop technologies
that blur all objects by default, then only unblur what is absolutely necessary
to facilitate natural-looking conversations.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects [61.323597069037056]
Current approaches for personalizing text-to-video generation suffer from tackling multiple subjects.
We propose CustomVideo, a novel framework that can generate identity-preserving videos with the guidance of multiple subjects.
arXiv Detail & Related papers (2024-01-18T13:23:51Z) - Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
with Pre-trained Vision-Language Models [149.1331903899298]
We propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge.
We present a Temporal Concept Spotting mechanism that uses the Text-to-Video expertise to capture temporal saliency in a parameter-free manner.
Our best model achieves a state-of-the-art accuracy of 88.6% on the challenging Kinetics-400 using the released CLIP model.
arXiv Detail & Related papers (2022-12-31T11:36:53Z) - Recovering Surveillance Video Using RF Cues [5.818870353966268]
We propose CSI2Video, a novel cross-modal method to recover fine-grained surveillance video in real-time.
Our solution generates realistic surveillance videos without any expensive wireless equipment and has ubiquitous, cheap, and real-time characteristics.
arXiv Detail & Related papers (2022-12-27T01:57:03Z) - Detection of Real-time DeepFakes in Video Conferencing with Active
Probing and Corneal Reflection [43.272069005626584]
We describe a new active forensic method to detect real-time DeepFakes.
We authenticate video calls by displaying a distinct pattern on the screen and using the corneal reflection extracted from the images of the call participant's face.
This pattern can be induced by a call participant displaying on a shared screen or directly integrated into the video-call client.
arXiv Detail & Related papers (2022-10-21T23:31:17Z) - Real or Virtual: A Video Conferencing Background Manipulation-Detection
System [25.94894351460089]
We present a detection strategy to distinguish between real and virtual video conferencing user backgrounds.
We demonstrate the robustness of our detector against different adversarial attacks that the adversary considers.
Our performance results show that we can perfectly identify a real from a virtual background with an accuracy of 99.80%.
arXiv Detail & Related papers (2022-04-25T08:14:11Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - Egocentric Videoconferencing [86.88092499544706]
Videoconferencing portrays valuable non-verbal communication and face expression cues, but usually requires a front-facing camera.
We propose a low-cost wearable egocentric camera setup that can be integrated into smart glasses.
Our goal is to mimic a classical video call, and therefore, we transform the egocentric perspective of this camera into a front facing video.
arXiv Detail & Related papers (2021-07-07T09:49:39Z) - LIFI: Towards Linguistically Informed Frame Interpolation [66.05105400951567]
We try to solve this problem by using several deep learning video generation algorithms to generate missing frames.
We release several datasets to test computer vision video generation models of their speech understanding.
arXiv Detail & Related papers (2020-10-30T05:02:23Z) - Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference
Attacks [4.878606901631679]
In recent world events, video calls have become the new norm for both personal and professional remote communication.
We design and evaluate an attack framework to infer one type of such private information from the video stream of a call -- keystrokes.
We propose and evaluate effective mitigation techniques that can automatically protect users when they type during a video call.
arXiv Detail & Related papers (2020-10-22T21:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.