PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation
- URL: http://arxiv.org/abs/2410.22623v1
- Date: Wed, 30 Oct 2024 01:02:20 GMT
- Title: PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation
- Authors: Ryozo Masukawa, Sanggeon Yun, Yoshiki Yamaguchi, Mohsen Imani,
- Abstract summary: We present PV-VTT (Privacy Violation Video To Text), a unique multimodal dataset aimed at identifying privacy violations.
PV-VTT provides detailed annotations for both video and text in scenarios.
This privacy-focused approach allows researchers to use the dataset while protecting participant confidentiality.
- Score: 5.0923114224599555
- License:
- Abstract: Video crime detection is a significant application of computer vision and artificial intelligence. However, existing datasets primarily focus on detecting severe crimes by analyzing entire video clips, often neglecting the precursor activities (i.e., privacy violations) that could potentially prevent these crimes. To address this limitation, we present PV-VTT (Privacy Violation Video To Text), a unique multimodal dataset aimed at identifying privacy violations. PV-VTT provides detailed annotations for both video and text in scenarios. To ensure the privacy of individuals in the videos, we only provide video feature vectors, avoiding the release of any raw video data. This privacy-focused approach allows researchers to use the dataset while protecting participant confidentiality. Recognizing that privacy violations are often ambiguous and context-dependent, we propose a Graph Neural Network (GNN)-based video description model. Our model generates a GNN-based prompt with image for Large Language Model (LLM), which deliver cost-effective and high-quality video descriptions. By leveraging a single video frame along with relevant text, our method reduces the number of input tokens required, maintaining descriptive quality while optimizing LLM API-usage. Extensive experiments validate the effectiveness and interpretability of our approach in video description tasks and flexibility of our PV-VTT dataset.
Related papers
- ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models [53.9661582975843]
Video Temporal Grounding aims to ground specific segments within an untrimmed video corresponding to a given natural language query.
Existing VTG methods largely depend on supervised learning and extensive annotated data, which is labor-intensive and prone to human biases.
We present ChatVTG, a novel approach that utilizes Video Dialogue Large Language Models (LLMs) for zero-shot video temporal grounding.
arXiv Detail & Related papers (2024-10-01T08:27:56Z) - CausalVE: Face Video Privacy Encryption via Causal Video Prediction [13.577971999457164]
With the proliferation of video and live-streaming websites, public-face video distribution and interactions pose greater privacy risks.
We propose a neural network framework, CausalVE, to address these shortcomings.
Our framework has good security in public video dissemination and outperforms state-of-the-art methods from a qualitative, quantitative, and visual point of view.
arXiv Detail & Related papers (2024-09-28T10:34:22Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models [58.17315970207874]
We propose a zero-shot method for adapting generalisable visual-textual priors from arbitrary VLM to facilitate moment-text alignment.
Experiments conducted on three VMR benchmark datasets demonstrate the notable performance advantages of our zero-shot algorithm.
arXiv Detail & Related papers (2023-09-01T13:06:50Z) - Privacy Protectability: An Information-theoretical Approach [4.14084373472438]
We propose a new metric, textitprivacy protectability, to characterize to what degree a video stream can be protected.
Our definition of privacy protectability is rooted in information theory and we develop efficient algorithms to estimate the metric.
arXiv Detail & Related papers (2023-05-25T04:06:55Z) - Large-capacity and Flexible Video Steganography via Invertible Neural
Network [60.34588692333379]
We propose a Large-capacity and Flexible Video Steganography Network (LF-VSN)
For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN)
For flexibility, we propose a key-controllable scheme, enabling different receivers to recover particular secret videos from the same cover video through specific keys.
arXiv Detail & Related papers (2023-04-24T17:51:35Z) - SPAct: Self-supervised Privacy Preservation for Action Recognition [73.79886509500409]
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.
Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data.
We present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels.
arXiv Detail & Related papers (2022-03-29T02:56:40Z) - Robust Privacy-Preserving Motion Detection and Object Tracking in
Encrypted Streaming Video [39.453548972987015]
We propose an efficient and robust privacy-preserving motion detection and multiple object tracking scheme for encrypted surveillance video bitstreams.
Our scheme achieves the best detection and tracking performance compared with existing works in the encrypted and compressed domain.
Our scheme can be effectively used in complex surveillance scenarios with different challenges, such as camera movement/jitter, dynamic background, and shadows.
arXiv Detail & Related papers (2021-08-30T11:58:19Z) - Privid: Practical, Privacy-Preserving Video Analytics Queries [6.7897713298300335]
This paper presents a new notion of differential privacy (DP) for video analytics, $(rho,K,epsilon)$-event-duration privacy.
We show that Privid achieves accuracies within 79-99% of a non-private system.
arXiv Detail & Related papers (2021-06-22T22:25:08Z) - Privacy-Preserving Video Classification with Convolutional Neural
Networks [8.51142156817993]
We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks.
We evaluate our proposed solution in an application for private human emotion recognition.
arXiv Detail & Related papers (2021-02-06T05:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.