Related papers: Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference Attacks

Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference Attacks

URL: http://arxiv.org/abs/2010.12078v1
Date: Thu, 22 Oct 2020 21:38:17 GMT
Title: Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference Attacks
Authors: Mohd Sabra, Anindya Maiti, Murtuza Jadliwala
Abstract summary: In recent world events, video calls have become the new norm for both personal and professional remote communication. We design and evaluate an attack framework to infer one type of such private information from the video stream of a call -- keystrokes. We propose and evaluate effective mitigation techniques that can automatically protect users when they type during a video call.
Score: 4.878606901631679
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Due to recent world events, video calls have become the new norm for both personal and professional remote communication. However, if a participant in a video call is not careful, he/she can reveal his/her private information to others in the call. In this paper, we design and evaluate an attack framework to infer one type of such private information from the video stream of a call -- keystrokes, i.e., text typed during the call. We evaluate our video-based keystroke inference framework using different experimental settings and parameters, including different webcams, video resolutions, keyboards, clothing, and backgrounds. Our relatively high keystroke inference accuracies under commonly occurring and realistic settings highlight the need for awareness and countermeasures against such attacks. Consequently, we also propose and evaluate effective mitigation techniques that can automatically protect users when they type during a video call.

Related papers

A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards [6.230751621285321]
This paper presents a state-of-the-art deep learning model in order to classify laptop keystrokes, using a smartphone integrated microphone. When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. We discuss a series of mitigation methods to protect users against these series of attacks.
arXiv Detail & Related papers (2023-08-02T10:51:36Z)
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought [62.619076257298204]
We motivate framing video reasoning as the sequential understanding of a small number of video reasonings. We introduce VIP, an inference-time challenge dataset designed to explore models' reasoning capabilities through video chain-of-thought. We benchmark GPT-4, GPT-3, and VICUNA on VIP, demonstrate the performance gap in complex video reasoning tasks, and encourage future work.
arXiv Detail & Related papers (2023-05-23T10:26:42Z)
Private Eye: On the Limits of Textual Screen Peeking via Eyeglass Reflections in Video Conferencing [18.84055230013228]
Video leaks participants' on-screen information because eyeglasses and other reflective objects unwittingly expose partial screen contents. Using mathematical modeling and human subjects experiments, this research explores the extent to which emerging webcams might leak recognizable textual information. Our work explores and characterizes the viable threat models based on optical attacks using multi-frame super resolution techniques on sequences of video frames.
arXiv Detail & Related papers (2022-05-08T23:29:13Z)
Real or Virtual: A Video Conferencing Background Manipulation-Detection System [25.94894351460089]
We present a detection strategy to distinguish between real and virtual video conferencing user backgrounds. We demonstrate the robustness of our detector against different adversarial attacks that the adversary considers. Our performance results show that we can perfectly identify a real from a virtual background with an accuracy of 99.80%.
arXiv Detail & Related papers (2022-04-25T08:14:11Z)
Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity. Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z)
SPAct: Self-supervised Privacy Preservation for Action Recognition [73.79886509500409]
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset. Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data. We present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels.
arXiv Detail & Related papers (2022-03-29T02:56:40Z)
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes. We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others. Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z)
Masking Modalities for Cross-modal Video Retrieval [93.10669981708878]
A common strategy for pre-training video encoders is to use the accompanying speech as weak supervision. We propose to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech. We show the superior performance of our "modality masking" pre-training approach for video retrieval on the How2R, YouCook2 and Condensed Movies datasets.
arXiv Detail & Related papers (2021-11-01T23:55:04Z)
Do Not Deceive Your Employer with a Virtual Background: A Video Conferencing Manipulation-Detection System [35.82676654231492]
We study the feasibility of an efficient tool to detect whether a videoconferencing user background is real. Our experiments confirm that cross co-occurrences matrices improve the robustness of the detector against different kinds of attacks.
arXiv Detail & Related papers (2021-06-29T07:31:21Z)
Privacy-Preserving Video Classification with Convolutional Neural Networks [8.51142156817993]
We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks. We evaluate our proposed solution in an application for private human emotion recognition.
arXiv Detail & Related papers (2021-02-06T05:05:31Z)
Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation. We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation. The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.