Related papers: Why is the video analytics accuracy fluctuating, and what can we do about it?

Why is the video analytics accuracy fluctuating, and what can we do about it?

URL: http://arxiv.org/abs/2208.12644v1
Date: Tue, 23 Aug 2022 23:16:24 GMT
Title: Why is the video analytics accuracy fluctuating, and what can we do about it?
Authors: Sibendu Paul, Kunal Rao, Giuseppe Coviello, Murugan Sankaradas, Oliver Po, Y. Charlie Hu, Srimat Chakradhar
Abstract summary: It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, the accuracy of video analytics application fluctuates noticeably.
Score: 2.0741583844039915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but these frames are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an unintentional adversary because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects(40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera's adversarial effect on deep learning models used for video analytics applications.

Related papers

Towards Understanding Camera Motions in Any Video [80.223048294482]
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding. CameraBench consists of 3,000 diverse internet videos annotated by experts through a rigorous quality control process. One of our contributions is a taxonomy of camera motion primitives, designed in collaboration with cinematographers.
arXiv Detail & Related papers (2025-04-21T18:34:57Z)
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [52.726585508669686]
We propose AnyCam, a fast transformer model that directly estimates camera poses and intrinsics from a dynamic video sequence. We test AnyCam on established datasets, where it delivers accurate camera poses and intrinsics both qualitatively and quantitatively. By combining camera information, uncertainty, and depth, our model can produce high-quality 4D pointclouds.
arXiv Detail & Related papers (2025-03-30T02:22:11Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [104.1338295060383]
We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes. Our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work.
arXiv Detail & Related papers (2024-12-05T18:59:42Z)
Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision. The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z)
APT: Adaptive Perceptual quality based camera Tuning using reinforcement learning [2.0741583844039915]
Capturing poor-quality video adversely affects the accuracy of analytics. We propose a novel, reinforcement-learning based system that tunes the camera parameters to ensure a high-quality video capture. As a result, such tuning restores the accuracy of insights when environmental conditions or scene content change.
arXiv Detail & Related papers (2022-11-15T21:02:48Z)
The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision. One approach to the problem is to teach a deep network to model all of these effects. We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z)
Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency [13.19476138523546]
Cross-video relation has barely been explored for visual representation learning. We propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning. We show significant improvement over state-of-the-art contrastive learning methods.
arXiv Detail & Related papers (2021-05-13T17:59:11Z)
Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning. A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives. We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations. We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z)
Few-Shot Learning for Video Object Detection in a Transfer-Learning Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection. We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z)
Decoupled Appearance and Motion Learning for Efficient Anomaly Detection in Surveillance Video [9.80717374118619]
We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion. Our model can process 16 to 45 times more frames per second than related approaches.
arXiv Detail & Related papers (2020-11-10T11:40:06Z)
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only. It is difficult to construct a suitable self-supervised task to well model both motion and appearance features. We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z)
Performance of object recognition in wearable videos [9.669942356088377]
This work studies the problem of object detection and localization on videos captured by this type of camera. We present a study of the well known YOLO architecture, that offers an excellent trade-off between accuracy and speed.
arXiv Detail & Related papers (2020-09-10T15:20:17Z)
Watching the World Go By: Representation Learning from Unlabeled Videos [78.22211989028585]
Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. In this paper, we argue that videos offer this natural augmentation for free. We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations.
arXiv Detail & Related papers (2020-03-18T00:07:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.