Online Generic Event Boundary Detection
- URL: http://arxiv.org/abs/2510.06855v1
- Date: Wed, 08 Oct 2025 10:23:45 GMT
- Title: Online Generic Event Boundary Detection
- Authors: Hyungrok Jung, Daneul Kim, Seunggyun Lim, Jeany Son, Jonghyun Choi,
- Abstract summary: We introduce a new task, Online Generic Event Boundary Detection (On-GEBD), aiming to detect boundaries of generic events immediately in streaming videos.<n>This task faces unique challenges of identifying subtle, taxonomy-free event changes in real-time, without the access to future frames.<n>We propose a novel On-GEBD framework, inspired by Event Theory (EST) which explains how humans segment ongoing activity into events by leveraging discrepancies between predicted and actual information.
- Score: 27.34486732049466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generic Event Boundary Detection (GEBD) aims to interpret long-form videos through the lens of human perception. However, current GEBD methods require processing complete video frames to make predictions, unlike humans processing data online and in real-time. To bridge this gap, we introduce a new task, Online Generic Event Boundary Detection (On-GEBD), aiming to detect boundaries of generic events immediately in streaming videos. This task faces unique challenges of identifying subtle, taxonomy-free event changes in real-time, without the access to future frames. To tackle these challenges, we propose a novel On-GEBD framework, Estimator, inspired by Event Segmentation Theory (EST) which explains how humans segment ongoing activity into events by leveraging the discrepancies between predicted and actual information. Our framework consists of two key components: the Consistent Event Anticipator (CEA), and the Online Boundary Discriminator (OBD). Specifically, the CEA generates a prediction of the future frame reflecting current event dynamics based solely on prior frames. Then, the OBD measures the prediction error and adaptively adjusts the threshold using statistical tests on past errors to capture diverse, subtle event transitions. Experimental results demonstrate that Estimator outperforms all baselines adapted from recent online video understanding models and achieves performance comparable to prior offline-GEBD methods on the Kinetics-GEBD and TAPOS datasets.
Related papers
- Generic Event Boundary Detection via Denoising Diffusion [42.88245960369029]
Generic event boundary detection aims to identify natural boundaries in a video, segmenting it into distinct and meaningful chunks.<n>Previous methods have focused on deterministic predictions, overlooking the diversity of plausible solutions.<n>We introduce a novel diffusion-based boundary detection model, dubbed DiffGEBD, that tackles the problem of GEBD from a generative perspective.
arXiv Detail & Related papers (2025-08-16T15:44:34Z) - Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding [49.51013055630857]
We tackle the task of online video temporal grounding (OnVTG), which requires the model to locate events related to a given text query within a video stream.<n>Unlike regular video temporal grounding, OnVTG requires the model to make predictions without observing future frames.<n>We propose an event-based OnVTG framework that makes predictions based on event proposals that model event-level information with various durations.
arXiv Detail & Related papers (2025-08-06T15:33:49Z) - ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning [57.767536707234036]
We propose a novel chain-of-thought reasoning based event stream scene text recognition framework, termed ESTR-CoT.<n>Specifically, we first adopt the vision encoder EVA-CLIP to transform the input event stream into tokens and utilize a Llama tokenizer to encode the given generation prompt.<n>A Q-former is used to align the vision token to the pre-trained large language model Vicuna-7B and output both the answer and chain-of-thought (CoT) reasoning process simultaneously.
arXiv Detail & Related papers (2025-07-02T23:41:31Z) - Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges [12.534976311190748]
Video event detection has become a cornerstone of modern sports analytics, powering automated performance evaluation, content generation, and tactical decision-making.<n>Recent advances in deep learning have driven progress in related tasks such as Action Spotting (AS), which identifies a representative timestamp; and Precise Event Spotting (PES), which pinpoints the exact frame of an event.
arXiv Detail & Related papers (2025-05-06T22:02:30Z) - ONSEP: A Novel Online Neural-Symbolic Framework for Event Prediction Based on Large Language Model [10.137013634329582]
We introduce the Online Neural-Symbolic Event Prediction framework.
ONSEP incorporates dynamic causal rule mining and dual history augmented generation.
Our framework demonstrates notable performance enhancements across diverse datasets.
arXiv Detail & Related papers (2024-08-14T22:28:19Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.<n>Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Unifying Event Detection and Captioning as Sequence Generation via
Pre-Training [53.613265415703815]
We propose a unified pre-training and fine-tuning framework to enhance the inter-task association between event detection and captioning.
Our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data.
arXiv Detail & Related papers (2022-07-18T14:18:13Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event
Boundary Detection [27.29169136392871]
Generic Event Boundary Detection (GEBD) aims to find one level deeper semantic boundaries of events.
We propose a novel framework for unsupervised/supervised GEBD, using the Temporal Self-similarity Matrix (TSM) as the video representation.
Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin.
arXiv Detail & Related papers (2021-11-29T18:50:39Z) - Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z) - Generic Event Boundary Detection: A Benchmark for Event Segmentation [21.914662894860474]
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
We introduce the task of Generic Event Boundary Detection (GEBD) and the new benchmark Kinetics-GEBD.
Inspired by the cognitive finding that humans mark boundaries at points where they are unable to predict the future accurately, we explore un-supervised approaches.
arXiv Detail & Related papers (2021-01-26T01:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.