Distributed Multi-agent Video Fast-forwarding
- URL: http://arxiv.org/abs/2008.04437v1
- Date: Mon, 10 Aug 2020 22:08:49 GMT
- Title: Distributed Multi-agent Video Fast-forwarding
- Authors: Shuyue Lan, Zhilu Wang, Amit K. Roy-Chowdhury, Ermin Wei, Qi Zhu
- Abstract summary: This paper presents a consensus-based distributed multi-agent video fast-forwarding framework, named DMVF, that fast-forwards multi-view video streams collaboratively and adaptively.
Compared with approaches in the literature on a real-world surveillance video dataset VideoWeb, our method significantly improves the coverage of important frames and also reduces the number of frames processed in the system.
- Score: 30.843484383185473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many intelligent systems, a network of agents collaboratively perceives
the environment for better and more efficient situation awareness. As these
agents often have limited resources, it could be greatly beneficial to identify
the content overlapping among camera views from different agents and leverage
it for reducing the processing, transmission and storage of
redundant/unimportant video frames. This paper presents a consensus-based
distributed multi-agent video fast-forwarding framework, named DMVF, that
fast-forwards multi-view video streams collaboratively and adaptively. In our
framework, each camera view is addressed by a reinforcement learning based
fast-forwarding agent, which periodically chooses from multiple strategies to
selectively process video frames and transmits the selected frames at
adjustable paces. During every adaptation period, each agent communicates with
a number of neighboring agents, evaluates the importance of the selected frames
from itself and those from its neighbors, refines such evaluation together with
other agents via a system-wide consensus algorithm, and uses such evaluation to
decide their strategy for the next period. Compared with approaches in the
literature on a real-world surveillance video dataset VideoWeb, our method
significantly improves the coverage of important frames and also reduces the
number of frames processed in the system.
Related papers
- An Empirical Comparison of Video Frame Sampling Methods for Multi-Modal RAG Retrieval [1.6581184950812533]
We investigate the trade-offs in frame sampling methods for Video & Frame Retrieval using natural language questions.
Our study focuses on the storage and retrieval of image data (video frames) within a vector database required by Video RAG pattern.
arXiv Detail & Related papers (2024-07-22T11:44:08Z) - An Empirical Study of Frame Selection for Text-to-Video Retrieval [62.28080029331507]
Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text.
Existing methods typically select a subset of frames within a video to represent the video content for TVR.
In this paper, we make the first empirical study of frame selection for TVR.
arXiv Detail & Related papers (2023-11-01T05:03:48Z) - Collaborative Multi-Agent Video Fast-Forwarding [30.843484383185473]
We develop two collaborative multi-agent video fast-forwarding frameworks in distributed and centralized settings.
In these frameworks, each individual agent can selectively process or skip video frames at adjustable paces based on multiple strategies.
We show that compared with other approaches in the literature, our frameworks achieve better coverage of important frames, while significantly reducing the number of frames processed at each agent.
arXiv Detail & Related papers (2023-05-27T20:12:19Z) - Context Sensing Attention Network for Video-based Person
Re-identification [20.865710012336724]
Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames.
Recent approaches handle this problem using temporal aggregation strategies.
We propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps.
arXiv Detail & Related papers (2022-07-06T12:48:27Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Video Super-resolution with Temporal Group Attention [127.21615040695941]
We propose a novel method that can effectively incorporate temporal information in a hierarchical way.
The input sequence is divided into several groups, with each one corresponding to a kind of frame rate.
It achieves favorable performance against state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2020-07-21T04:54:30Z) - Hierarchical Attention Network for Action Segmentation [45.19890687786009]
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video.
We propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time.
We evaluate our system on challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets.
arXiv Detail & Related papers (2020-05-07T02:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.