Online Multi-modal Person Search in Videos
- URL: http://arxiv.org/abs/2008.03546v1
- Date: Sat, 8 Aug 2020 15:48:32 GMT
- Title: Online Multi-modal Person Search in Videos
- Authors: Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua
Lin
- Abstract summary: We propose an online person search framework, which can recognize people in a video on the fly.
Our experiments on a large movie dataset show that the proposed method is effective.
- Score: 74.75432003006432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of searching certain people in videos has seen increasing potential
in real-world applications, such as video organization and editing. Most
existing approaches are devised to work in an offline manner, where identities
can only be inferred after an entire video is examined. This working manner
precludes such methods from being applied to online services or those
applications that require real-time responses. In this paper, we propose an
online person search framework, which can recognize people in a video on the
fly. This framework maintains a multimodal memory bank at its heart as the
basis for person recognition, and updates it dynamically with a policy obtained
by reinforcement learning. Our experiments on a large movie dataset show that
the proposed method is effective, not only achieving remarkable improvements
over online schemes but also outperforming offline methods.
Related papers
- Online Video Understanding: OVBench and VideoChat-Online [22.814813541695997]
Multimodal Large Language Models (MLLMs) have significantly progressed in offline video understanding.
Applying these models to real-world scenarios, such as autonomous driving and human-computer interaction, presents unique challenges.
This paper presents systematic efforts from three perspectives: evaluation benchmark, model architecture, and training strategy.
arXiv Detail & Related papers (2024-12-31T18:17:05Z) - Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning [8.251711947874238]
offline RL provides a promising solution by giving an offline policy, which can be refined through online interactions.
Existing approaches perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation.
Our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
arXiv Detail & Related papers (2024-05-12T08:52:52Z) - Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition [84.31749632725929]
In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method.
Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains.
arXiv Detail & Related papers (2024-03-03T16:48:16Z) - OnlineRefer: A Simple Online Baseline for Referring Video Object
Segmentation [75.07460026246582]
Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.
Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding.
We propose a simple yet effective online model using explicit query propagation, named OnlineRefer.
arXiv Detail & Related papers (2023-07-18T15:43:35Z) - ChatVideo: A Tracklet-centric Multimodal and Versatile Video
Understanding System [119.51012668709502]
We present our vision for multimodal and versatile video understanding and propose a prototype system, system.
Our system is built upon a tracklet-centric paradigm, which treats tracklets as the basic video unit.
All the detected tracklets are stored in a database and interact with the user through a database manager.
arXiv Detail & Related papers (2023-04-27T17:59:58Z) - Towards A Multi-agent System for Online Hate Speech Detection [11.843799418046666]
This paper envisions a multi-agent system for detecting the presence of hate speech in online social media platforms such as Twitter and Facebook.
We introduce a novel framework employing deep learning techniques to coordinate the channels of textual and im-age processing.
arXiv Detail & Related papers (2021-05-03T19:06:42Z) - Online Learnable Keyframe Extraction in Videos and its Application with
Semantic Word Vector in Action Recognition [5.849485167287474]
We propose an online learnable module for extraction of key-shots in video.
This module can be used to select key-shots in video and thus can be applied to video summarization.
We also propose a plugin module to use the semantic word vector as input along withs and a novel train/test strategy for the classification models.
arXiv Detail & Related papers (2020-09-25T20:54:46Z) - WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos [124.72839555467944]
We propose a weakly supervised framework that can be trained using only video-class labels.
We show that our method largely outperforms weakly-supervised baselines.
When strongly supervised, our method obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
arXiv Detail & Related papers (2020-06-05T23:08:41Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.