Why is the video analytics accuracy fluctuating, and what can we do
about it?
- URL: http://arxiv.org/abs/2208.12644v1
- Date: Tue, 23 Aug 2022 23:16:24 GMT
- Title: Why is the video analytics accuracy fluctuating, and what can we do
about it?
- Authors: Sibendu Paul, Kunal Rao, Giuseppe Coviello, Murugan Sankaradas, Oliver
Po, Y. Charlie Hu, Srimat Chakradhar
- Abstract summary: It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos.
In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed.
We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, the accuracy of video analytics application fluctuates noticeably.
- Score: 2.0741583844039915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a common practice to think of a video as a sequence of images (frames),
and re-use deep neural network models that are trained only on images for
similar analytics tasks on videos. In this paper, we show that this leap of
faith that deep learning models that work well on images will also work well on
videos is actually flawed. We show that even when a video camera is viewing a
scene that is not changing in any human-perceptible way, and we control for
external factors like video compression and environment (lighting), the
accuracy of video analytics application fluctuates noticeably. These
fluctuations occur because successive frames produced by the video camera may
look similar visually, but these frames are perceived quite differently by the
video analytics applications. We observed that the root cause for these
fluctuations is the dynamic camera parameter changes that a video camera
automatically makes in order to capture and produce a visually pleasing video.
The camera inadvertently acts as an unintentional adversary because these
slight changes in the image pixel values in consecutive frames, as we show,
have a noticeably adverse impact on the accuracy of insights from video
analytics tasks that re-use image-trained deep learning models. To address this
inadvertent adversarial effect from the camera, we explore the use of transfer
learning techniques to improve learning in video analytics tasks through the
transfer of knowledge from learning on image analytics tasks. In particular, we
show that our newly trained Yolov5 model reduces fluctuation in object
detection across frames, which leads to better tracking of objects(40% fewer
mistakes in tracking). Our paper also provides new directions and techniques to
mitigate the camera's adversarial effect on deep learning models used for video
analytics applications.
Related papers
- Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - APT: Adaptive Perceptual quality based camera Tuning using reinforcement
learning [2.0741583844039915]
Capturing poor-quality video adversely affects the accuracy of analytics.
We propose a novel, reinforcement-learning based system that tunes the camera parameters to ensure a high-quality video capture.
As a result, such tuning restores the accuracy of insights when environmental conditions or scene content change.
arXiv Detail & Related papers (2022-11-15T21:02:48Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - Contrastive Learning of Image Representations with Cross-Video
Cycle-Consistency [13.19476138523546]
Cross-video relation has barely been explored for visual representation learning.
We propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.
We show significant improvement over state-of-the-art contrastive learning methods.
arXiv Detail & Related papers (2021-05-13T17:59:11Z) - Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning.
A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives.
We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations.
We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Decoupled Appearance and Motion Learning for Efficient Anomaly Detection
in Surveillance Video [9.80717374118619]
We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion.
Our model can process 16 to 45 times more frames per second than related approaches.
arXiv Detail & Related papers (2020-11-10T11:40:06Z) - RSPNet: Relative Speed Perception for Unsupervised Video Representation
Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only.
It is difficult to construct a suitable self-supervised task to well model both motion and appearance features.
We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z) - Performance of object recognition in wearable videos [9.669942356088377]
This work studies the problem of object detection and localization on videos captured by this type of camera.
We present a study of the well known YOLO architecture, that offers an excellent trade-off between accuracy and speed.
arXiv Detail & Related papers (2020-09-10T15:20:17Z) - Watching the World Go By: Representation Learning from Unlabeled Videos [78.22211989028585]
Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks.
In this paper, we argue that videos offer this natural augmentation for free.
We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations.
arXiv Detail & Related papers (2020-03-18T00:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.