A Spatio-Temporal Identity Verification Method for Person-Action
Instance Search in Movies
- URL: http://arxiv.org/abs/2111.00228v1
- Date: Sat, 30 Oct 2021 11:00:47 GMT
- Title: A Spatio-Temporal Identity Verification Method for Person-Action
Instance Search in Movies
- Authors: Jingyao Yang, Chao Liang, Yanrui Niu, Baojin Huang and Zhongyuan Wang
- Abstract summary: Person-Action Instance Search (INS) aims to retrieve shots with specific person carrying out specific action from massive video shots.
Direct aggregation of two individual INS scores cannot guarantee the identity consistency between person and action.
We propose an identity consistency verification scheme to optimize the direct fusion score of person INS and action INS.
- Score: 32.76347250146175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As one of the challenging problems in video search, Person-Action Instance
Search (INS) aims to retrieve shots with specific person carrying out specific
action from massive video shots. Existing methods mainly include two steps:
First, two individual INS branches, i.e., person INS and action INS, are
separately conducted to compute the initial person and action ranking scores;
Second, both scores are directly fused to generate the final ranking list.
However, direct aggregation of two individual INS scores cannot guarantee the
identity consistency between person and action. For example, a shot with "Pat
is standing" and "Ian is sitting on couch" may be erroneously understood as
"Pat is sitting on couch" or "Ian is standing". To address the above identity
inconsistency problem (IIP), we study a spatio-temporal identity verification
method. Specifically, in the spatial dimension, we propose an identity
consistency verification scheme to optimize the direct fusion score of person
INS and action INS. The motivation originates from an observation that face
detection results usually locate in the identity-consistent action bounding
boxes. Moreover, in the temporal dimension, considering the complex filming
condition, we propose an inter-frame detection extension operation to
interpolate missing face/action detection results in successive video frames.
The proposed method is evaluated on the large scale TRECVID INS dataset, and
the experimental results show that our method can effectively mitigate the IIP
and surpass the existing second places in both TRECVID 2019 and 2020 INS tasks.
Related papers
- Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification
via Full-Body Person Synthesis [16.394031759681678]
We motivate and introduce the Pedestrian dataset De-Identification task.
PDI evaluates the degree of de-identification and downstream task training performance for a given de-identification method.
We show how our data is able to narrow the synthetic-to-real performance gap in a privacy-conscious manner.
arXiv Detail & Related papers (2023-06-20T17:39:24Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Actor-identified Spatiotemporal Action Detection -- Detecting Who Is
Doing What in Videos [29.5205455437899]
Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos.
Spatiotemporal Action Detection (SAD) has been studied for localizing the action both spatially and temporally in videos.
We propose a novel task, Actor-identified Spatiotemporal Action Detection (ASAD) to bridge the gap between SAD actor identification.
arXiv Detail & Related papers (2022-08-27T06:51:12Z) - Exploring Visual Context for Weakly Supervised Person Search [155.46727990750227]
Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person re-identification.
Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available.
This paper inventively considers weakly supervised person search with only bounding box annotations.
arXiv Detail & Related papers (2021-06-19T14:47:13Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Taking Modality-free Human Identification as Zero-shot Learning [46.51413603352702]
We develop a novel Modality-Free Human Identification (named MFHI) task as a generic zero-shot learning model in a scalable way.
It is capable of bridging the visual and semantic modalities by learning a discriminative prototype of each identity.
In addition, the semantics-guided spatial attention is enforced on visual modality to obtain representations with both high global category-level and local attribute-level discrimination.
arXiv Detail & Related papers (2020-10-02T13:08:27Z) - Pose-guided Visible Part Matching for Occluded Person ReID [80.81748252960843]
We propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility.
Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-01T04:36:51Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z) - Towards Precise Intra-camera Supervised Person Re-identification [54.86892428155225]
Intra-camera supervision (ICS) for person re-identification (Re-ID) assumes that identity labels are independently annotated within each camera view.
Lack of inter-camera labels makes the ICS Re-ID problem much more challenging than the fully supervised counterpart.
Our approach performs even comparable to state-of-the-art fully supervised methods in two of the datasets.
arXiv Detail & Related papers (2020-02-12T11:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.