MAE-GEBD:Winning the CVPR'2023 LOVEU-GEBD Challenge
- URL: http://arxiv.org/abs/2306.15704v1
- Date: Tue, 27 Jun 2023 02:35:19 GMT
- Title: MAE-GEBD:Winning the CVPR'2023 LOVEU-GEBD Challenge
- Authors: Yuanxi Sun, Rui He, Youzeng Li, Zuwei Huang, Feng Hu, Xu Cheng, Jie
Tang
- Abstract summary: We build a model for segmenting videos into segments by detecting general event boundaries applicable to various classes.
Based on last year's MAE-GEBD method, we have improved our model performance on the GEBD task by adjusting the data processing strategy and loss function.
With our method, we achieve an F1 score of 86.03% on the Kinetics-GEBD test set, which is a 0.09% improvement in the F1 score compared to our 2022 Kinetics-GEBD method.
- Score: 11.823891739821443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Generic Event Boundary Detection (GEBD) task aims to build a model for
segmenting videos into segments by detecting general event boundaries
applicable to various classes. In this paper, based on last year's MAE-GEBD
method, we have improved our model performance on the GEBD task by adjusting
the data processing strategy and loss function. Based on last year's approach,
we extended the application of pseudo-label to a larger dataset and made many
experimental attempts. In addition, we applied focal loss to concentrate more
on difficult samples and improved our model performance. Finally, we improved
the segmentation alignment strategy used last year, and dynamically adjusted
the segmentation alignment method according to the boundary density and
duration of the video, so that our model can be more flexible and fully
applicable in different situations. With our method, we achieve an F1 score of
86.03% on the Kinetics-GEBD test set, which is a 0.09% improvement in the F1
score compared to our 2022 Kinetics-GEBD method.
Related papers
- Rethinking the Architecture Design for Efficient Generic Event Boundary Detection [71.50748944513379]
Generic (GEBD) is inspired by human visual cognitive cognitive behaviors of consistently segmenting videos into meaningful temporal chunks.
SOTA GEBD models often prioritize final performance over model complexity, resulting in low inference speed and hindering efficient deployment in real-world scenarios.
We experimentally reexamining the architecture of GEBD models and contribute to addressing this challenge.
arXiv Detail & Related papers (2024-07-17T14:49:54Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection [1.3695134621603882]
Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events.
Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space.
We propose FlowGEBD, a non-parametric, unsupervised technique for GEBD.
arXiv Detail & Related papers (2024-02-15T14:49:15Z) - ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to
Improve Segmentation Performance [61.04246102067351]
We propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic.
We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images.
arXiv Detail & Related papers (2023-07-02T10:39:29Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - Masked Autoencoders for Generic Event Boundary Detection CVPR'2022
Kinetics-GEBD Challenge [11.823891739821443]
Generic Event Boundary Detection (GEBD) tasks aim at detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
In this paper, we apply Masked Autoencoders to improve algorithm performance on the GEBD tasks.
With our approach, we achieved 85.94% on the F1-score on the Kinetics-GEBD test set, which improved the F1-score by 2.31% compared to the winner of the 2021 Kinetics-GEBD Challenge.
arXiv Detail & Related papers (2022-06-17T08:10:27Z) - Winning the CVPR'2021 Kinetics-GEBD Challenge: Contrastive Learning
Approach [27.904987752334314]
We introduce a novel contrastive learning based approach to deal with the Generic Event Boundary Detection task.
In our model, Temporal Self-similarity Matrix (TSM) is utilized as an intermediate representation which takes on a role as an information bottleneck.
arXiv Detail & Related papers (2021-06-22T05:21:59Z) - A Stronger Baseline for Ego-Centric Action Detection [38.934802199184354]
This report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR 2021 Workshop.
The goal of our task is to locate the start time and the end time of the action in the long untrimmed video, and predict action category.
We adopt sliding window strategy to generate proposals, which can better adapt to short-duration actions.
arXiv Detail & Related papers (2021-06-13T08:11:31Z) - Fast Template Matching and Update for Video Object Tracking and
Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames.
The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template.
We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.