Differentially Private Video Activity Recognition
- URL: http://arxiv.org/abs/2306.15742v1
- Date: Tue, 27 Jun 2023 18:47:09 GMT
- Title: Differentially Private Video Activity Recognition
- Authors: Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding
Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar
- Abstract summary: We propose Multi-Clip DP-SGD, a novel framework for enforcing video-level differential privacy through clip-based classification models.
Our approach achieves 81% accuracy with a privacy budget of epsilon=5 on UCF-101, marking a 76% improvement compared to a direct application of DP-SGD.
- Score: 79.36113764129092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, differential privacy has seen significant advancements in
image classification; however, its application to video activity recognition
remains under-explored. This paper addresses the challenges of applying
differential privacy to video activity recognition, which primarily stem from:
(1) a discrepancy between the desired privacy level for entire videos and the
nature of input data processed by contemporary video architectures, which are
typically short, segmented clips; and (2) the complexity and sheer size of
video datasets relative to those in image classification, which render
traditional differential privacy methods inadequate. To tackle these issues, we
propose Multi-Clip DP-SGD, a novel framework for enforcing video-level
differential privacy through clip-based classification models. This method
samples multiple clips from each video, averages their gradients, and applies
gradient clipping in DP-SGD without incurring additional privacy loss.
Moreover, we incorporate a parameter-efficient transfer learning strategy to
make the model scalable for large-scale video datasets. Through extensive
evaluations on the UCF-101 and HMDB-51 datasets, our approach exhibits
impressive performance, achieving 81% accuracy with a privacy budget of
epsilon=5 on UCF-101, marking a 76% improvement compared to a direct
application of DP-SGD. Furthermore, we demonstrate that our transfer learning
strategy is versatile and can enhance differentially private image
classification across an array of datasets including CheXpert, ImageNet,
CIFAR-10, and CIFAR-100.
Related papers
- A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Few-shot Action Recognition via Intra- and Inter-Video Information
Maximization [28.31541961943443]
We propose a novel framework, Video Information Maximization (VIM), for few-shot action recognition.
VIM is equipped with an adaptive spatial-temporal video sampler and atemporal action alignment model.
VIM acts to maximize the distinctiveness of video information from limited video data.
arXiv Detail & Related papers (2023-05-10T13:05:43Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - Privacy-Preserving Video Classification with Convolutional Neural
Networks [8.51142156817993]
We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks.
We evaluate our proposed solution in an application for private human emotion recognition.
arXiv Detail & Related papers (2021-02-06T05:05:31Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - Frame Aggregation and Multi-Modal Fusion Framework for Video-Based
Person Recognition [13.875674649636874]
We propose a Frame Aggregation and Multi-Modal Fusion (FAMF) framework for video-based person recognition.
FAMF aggregates face features and incorporates them with multi-modal information to identify persons in videos.
We show that introducing an attention mechanism to NetVLAD can effectively decrease the impact of low-quality frames.
arXiv Detail & Related papers (2020-10-19T08:06:40Z) - Self-supervised Video Representation Learning Using Inter-intra
Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos.
Because video representation is important, we extend negative samples by introducing intra-negative samples.
We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.