Related papers: Private Eye: On the Limits of Textual Screen Peeking via Eyeglass Reflections in Video Conferencing

Private Eye: On the Limits of Textual Screen Peeking via Eyeglass Reflections in Video Conferencing

URL: http://arxiv.org/abs/2205.03971v1
Date: Sun, 8 May 2022 23:29:13 GMT
Title: Private Eye: On the Limits of Textual Screen Peeking via Eyeglass Reflections in Video Conferencing
Authors: Yan Long, Chen Yan, Shivan Prasad, Wenyuan Xu, Kevin Fu
Abstract summary: Video leaks participants' on-screen information because eyeglasses and other reflective objects unwittingly expose partial screen contents. Using mathematical modeling and human subjects experiments, this research explores the extent to which emerging webcams might leak recognizable textual information. Our work explores and characterizes the viable threat models based on optical attacks using multi-frame super resolution techniques on sequences of video frames.
Score: 18.84055230013228
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Personal video conferencing has become the new norm after COVID-19 caused a seismic shift from in-person meetings and phone calls to video conferencing for daily communications and sensitive business. Video leaks participants' on-screen information because eyeglasses and other reflective objects unwittingly expose partial screen contents. Using mathematical modeling and human subjects experiments, this research explores the extent to which emerging webcams might leak recognizable textual information gleamed from eyeglass reflections captured by webcams. The primary goal of our work is to measure, compute, and predict the factors, limits, and thresholds of recognizability as webcam technology evolves in the future. Our work explores and characterizes the viable threat models based on optical attacks using multi-frame super resolution techniques on sequences of video frames. Our experimental results and models show it is possible to reconstruct and recognize on-screen text with a height as small as 10 mm with a 720p webcam. We further apply this threat model to web textual content with varying attacker capabilities to find thresholds at which text becomes recognizable. Our user study with 20 participants suggests present-day 720p webcams are sufficient for adversaries to reconstruct textual content on big-font websites. Our models further show that the evolution toward 4K cameras will tip the threshold of text leakage to reconstruction of most header texts on popular websites. Our research proposes near-term mitigations, and justifies the importance of following the principle of least privilege for long-term defense against this attack. For privacy-sensitive scenarios, it's further recommended to develop technologies that blur all objects by default, then only unblur what is absolutely necessary to facilitate natural-looking conversations.

Related papers

TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions [0.562479170374811]
This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations.
arXiv Detail & Related papers (2025-01-02T09:21:03Z)
Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud. In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video. We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z)
CausalVE: Face Video Privacy Encryption via Causal Video Prediction [13.577971999457164]
With the proliferation of video and live-streaming websites, public-face video distribution and interactions pose greater privacy risks. We propose a neural network framework, CausalVE, to address these shortcomings. Our framework has good security in public video dissemination and outperforms state-of-the-art methods from a qualitative, quantitative, and visual point of view.
arXiv Detail & Related papers (2024-09-28T10:34:22Z)
Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects [61.323597069037056]
Current approaches for personalizing text-to-video generation suffer from tackling multiple subjects. We propose CustomVideo, a novel framework that can generate identity-preserving videos with the guidance of multiple subjects.
arXiv Detail & Related papers (2024-01-18T13:23:51Z)
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models [149.1331903899298]
We propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge. We present a Temporal Concept Spotting mechanism that uses the Text-to-Video expertise to capture temporal saliency in a parameter-free manner. Our best model achieves a state-of-the-art accuracy of 88.6% on the challenging Kinetics-400 using the released CLIP model.
arXiv Detail & Related papers (2022-12-31T11:36:53Z)
Recovering Surveillance Video Using RF Cues [5.818870353966268]
We propose CSI2Video, a novel cross-modal method to recover fine-grained surveillance video in real-time. Our solution generates realistic surveillance videos without any expensive wireless equipment and has ubiquitous, cheap, and real-time characteristics.
arXiv Detail & Related papers (2022-12-27T01:57:03Z)
Detection of Real-time DeepFakes in Video Conferencing with Active Probing and Corneal Reflection [43.272069005626584]
We describe a new active forensic method to detect real-time DeepFakes. We authenticate video calls by displaying a distinct pattern on the screen and using the corneal reflection extracted from the images of the call participant's face. This pattern can be induced by a call participant displaying on a shared screen or directly integrated into the video-call client.
arXiv Detail & Related papers (2022-10-21T23:31:17Z)
Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime. This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z)
Video Generation from Text Employing Latent Path Construction for Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study. In this paper, we tackle the text to video generation problem, which is a conditional form of video generation. We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z)
Egocentric Videoconferencing [86.88092499544706]
Videoconferencing portrays valuable non-verbal communication and face expression cues, but usually requires a front-facing camera. We propose a low-cost wearable egocentric camera setup that can be integrated into smart glasses. Our goal is to mimic a classical video call, and therefore, we transform the egocentric perspective of this camera into a front facing video.
arXiv Detail & Related papers (2021-07-07T09:49:39Z)
Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference Attacks [4.878606901631679]
In recent world events, video calls have become the new norm for both personal and professional remote communication. We design and evaluate an attack framework to infer one type of such private information from the video stream of a call -- keystrokes. We propose and evaluate effective mitigation techniques that can automatically protect users when they type during a video call.
arXiv Detail & Related papers (2020-10-22T21:38:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.