Generic Event Boundary Detection: A Benchmark for Event Segmentation
- URL: http://arxiv.org/abs/2101.10511v1
- Date: Tue, 26 Jan 2021 01:31:30 GMT
- Title: Generic Event Boundary Detection: A Benchmark for Event Segmentation
- Authors: Mike Zheng Shou, Deepti Ghadiyaram, Weiyao Wang, Matt Feiszli
- Abstract summary: This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
We introduce the task of Generic Event Boundary Detection (GEBD) and the new benchmark Kinetics-GEBD.
Inspired by the cognitive finding that humans mark boundaries at points where they are unable to predict the future accurately, we explore un-supervised approaches.
- Score: 21.914662894860474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel task together with a new benchmark for detecting
generic, taxonomy-free event boundaries that segment a whole video into chunks.
Conventional work in temporal video segmentation and action detection focuses
on localizing pre-defined action categories and thus does not scale to generic
videos. Cognitive Science has known since last century that humans consistently
segment videos into meaningful temporal chunks. This segmentation happens
naturally, with no pre-defined event categories and without being explicitly
asked to do so. Here, we repeat these cognitive experiments on mainstream CV
datasets; with our novel annotation guideline which addresses the complexities
of taxonomy-free event boundary annotation, we introduce the task of Generic
Event Boundary Detection (GEBD) and the new benchmark Kinetics-GEBD. Through
experiment and human study we demonstrate the value of the annotations. We view
this as an important stepping stone towards understanding the video as a whole,
and believe it has been previously neglected due to a lack of proper task
definition and annotations. Further, inspired by the cognitive finding that
humans mark boundaries at points where they are unable to predict the future
accurately, we explore un-supervised approaches based on temporal
predictability. We identify and extensively explore important design factors
for GEBD models on the TAPOS dataset and our Kinetics-GEBD while achieving
competitive performance and suggesting future work. We will release our
annotations and code at CVPR'21 LOVEU Challenge:
https://sites.google.com/view/loveucvpr21
Related papers
- Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks.
We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z) - Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras [45.063747874243276]
We present EV-WSSS: a novel weakly supervised approach for event-based semantic segmentation.
The proposed framework performs asymmetric dual-student learning between 1) the original forward event data and 2) the longer reversed event data.
We show that the proposed method achieves substantial segmentation results even without relying on pixel-level dense ground truths.
arXiv Detail & Related papers (2024-07-15T20:00:50Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Generic Event Boundary Detection in Video with Pyramid Features [12.896848011230523]
Generic event boundary detection (GEBD) aims to split video into chunks at a broad and diverse set of actions as humans naturally perceive event boundaries.
We present an approach that considers the correlation between neighbor frames with pyramid feature maps in both spatial and temporal dimensions.
arXiv Detail & Related papers (2023-01-11T03:29:27Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - SegTAD: Precise Temporal Action Detection via Semantic Segmentation [65.01826091117746]
We formulate the task of temporal action detection in a novel perspective of semantic segmentation.
Owing to the 1-dimensional property of TAD, we are able to convert the coarse-grained detection annotations to fine-grained semantic segmentation annotations for free.
We propose an end-to-end framework SegTAD composed of a 1D semantic segmentation network (1D-SSN) and a proposal detection network (PDN)
arXiv Detail & Related papers (2022-03-03T06:52:13Z) - Boundary-aware Self-supervised Learning for Video Scene Segmentation [20.713635723315527]
Video scene segmentation is a task of temporally localizing scene boundaries in a video.
We introduce three novel boundary-aware pretext tasks: Shot-Scene Matching, Contextual Group Matching and Pseudo-boundary Prediction.
We achieve the new state-of-the-art on the MovieNet-SSeg benchmark.
arXiv Detail & Related papers (2022-01-14T02:14:07Z) - CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation [118.18977078626776]
We propose an end-to-end self-supervised learning framework for event segmentation/boundary detection.
Our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.
The goal of our work is to segment generic events rather than localize some specific ones.
arXiv Detail & Related papers (2021-09-30T14:40:32Z) - Winning the CVPR'2021 Kinetics-GEBD Challenge: Contrastive Learning
Approach [27.904987752334314]
We introduce a novel contrastive learning based approach to deal with the Generic Event Boundary Detection task.
In our model, Temporal Self-similarity Matrix (TSM) is utilized as an intermediate representation which takes on a role as an information bottleneck.
arXiv Detail & Related papers (2021-06-22T05:21:59Z) - STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP)
Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains.
For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.