Related papers: MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification

MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification

URL: http://arxiv.org/abs/2211.04656v2
Date: Thu, 10 Nov 2022 14:35:24 GMT
Title: MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification
Authors: Daniel Davila, Dawei Du, Bryon Lewis, Christopher Funk, Joseph Van Pelt, Roderick Collins, Kellie Corona, Matt Brown, Scott McCloskey, Anthony Hoogs, Brian Clipp
Abstract summary: We present the Multi-view Extended Videos with Identities (MEVID) dataset for large-scale, video person re-identification (ReID) in the wild. We label the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames. Being based on the MEVA video dataset, we also inherit data that is intentionally demographically balanced to the continental United States.
Score: 17.72434646703505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present the Multi-view Extended Videos with Identities (MEVID) dataset for large-scale, video person re-identification (ReID) in the wild. To our knowledge, MEVID represents the most-varied video person ReID dataset, spanning an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specifically, we label the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames, seen in 33 camera views from the very large-scale MEVA person activities dataset. While other datasets have more unique identities, MEVID emphasizes a richer set of information about each individual, such as: 4 outfits/identity vs. 2 outfits/identity in CCVID, 33 viewpoints across 17 locations vs. 6 in 5 simulated locations for MTA, and 10 million frames vs. 3 million for LS-VID. Being based on the MEVA video dataset, we also inherit data that is intentionally demographically balanced to the continental United States. To accelerate the annotation process, we developed a semi-automatic annotation framework and GUI that combines state-of-the-art real-time models for object detection, pose estimation, person ReID, and multi-object tracking. We evaluate several state-of-the-art methods on MEVID challenge problems and comprehensively quantify their robustness in terms of changes of outfit, scale, and background location. Our quantitative analysis on the realistic, unique aspects of MEVID shows that there are significant remaining challenges in video person ReID and indicates important directions for future research.

Related papers

Video Individual Counting for Moving Drones [51.429771128144964]
Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance. Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals. We propose a density map based VIC method based on a MovingDroneCrowd dataset.
arXiv Detail & Related papers (2025-03-12T07:09:33Z)
Unconstrained Body Recognition at Altitude and Range: Comparing Four Approaches [0.0]
We focus on learning persistent body shape characteristics that remain stable over time. We introduce a body identification model based on a Vision Transformer (ViT) and on a Swin-ViT model. All models are trained on a large and diverse dataset of over 1.9 million images of approximately 5k identities across 9 databases.
arXiv Detail & Related papers (2025-02-10T23:49:06Z)
Multimodal Group Emotion Recognition In-the-wild Using Privacy-Compliant Features [0.0]
Group-level emotion recognition can be useful in many fields including social robotics, conversational agents, e-coaching and learning analytics. This paper explores privacy-compliant group-level emotion recognition ''in-the-wild'' within the EmotiW Challenge 2023.
arXiv Detail & Related papers (2023-12-06T08:58:11Z)
Replay: Multi-modal Multi-view Acted Videos for Casual Holography [76.49914880351167]
Replay is a collection of multi-view, multi-modal videos of humans interacting socially. Overall, the dataset contains over 4000 minutes of footage and over 7 million timestamped high-resolution frames. The Replay dataset has many potential applications, such as novel-view synthesis, 3D reconstruction, novel-view acoustic synthesis, human body and face analysis, and training generative models.
arXiv Detail & Related papers (2023-07-22T12:24:07Z)
Human-Object Interaction Prediction in Videos through Gaze Following [9.61701724661823]
We design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human information since people often fixate on an object before interacting with it. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life.
arXiv Detail & Related papers (2023-06-06T11:36:14Z)
Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification [10.490428828061292]
ideo-based person re-identification (Re-ID) aims to match person images in video sequences captured by disjoint surveillance cameras. Traditional video-based person Re-ID methods focus on exploring appearance information, thus, vulnerable against illumination changes, scene noises, camera parameters, and especially clothes/carrying variations. We propose a framework that utilizes the sequence masks (SeqMasks) in the video to integrate appearance information and gait modeling in a close fashion.
arXiv Detail & Related papers (2021-12-10T16:00:20Z)
Ego4D: Around the World in 3,000 Hours of Egocentric Video [276.1326075259486]
Ego4D is a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of daily-life activity video spanning hundreds of scenarios captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event.
arXiv Detail & Related papers (2021-10-13T22:19:32Z)
APES: Audiovisual Person Search in Untrimmed Video [87.4124877066541]
We present the Audiovisual Person Search dataset (APES) APES contains over 1.9K identities labeled along 36 hours of video. A key property of APES is that it includes dense temporal annotations that link faces to speech segments of the same identity.
arXiv Detail & Related papers (2021-06-03T08:16:42Z)
Long-term Person Re-identification: A Benchmark [57.97182942537195]
In realworld we often dress ourselves differently across locations, time, dates, seasons, weather, and events. This work contributes timely a large, realistic long-term person re-identification benchmark. It consists of 171K bounding boxes from 1.1K person identities, collected and constructed over a course of 12 months.
arXiv Detail & Related papers (2021-05-31T03:35:00Z)
PoseTrackReID: Dataset Description [97.7241689753353]
Pose information is helpful to disentangle useful feature information from background or occlusion noise. With PoseTrackReID, we want to bridge the gap between person re-ID and multi-person pose tracking. This dataset provides a good benchmark for current state-of-the-art methods on multi-frame person re-ID.
arXiv Detail & Related papers (2020-11-12T07:44:25Z)
Surpassing Real-World Source Training Data: Random 3D Characters for Generalizable Person Re-Identification [109.68210001788506]
We propose to automatically synthesize a large-scale person re-identification dataset following a set-up similar to real surveillance. We simulate a number of different virtual environments using Unity3D, with customized camera networks similar to real surveillance systems. As a result, we obtain a virtual dataset, called RandPerson, with 1,801,816 person images of 8,000 identities.
arXiv Detail & Related papers (2020-06-23T05:38:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.