Masked Autoencoders for Generic Event Boundary Detection CVPR'2022
Kinetics-GEBD Challenge
- URL: http://arxiv.org/abs/2206.08610v1
- Date: Fri, 17 Jun 2022 08:10:27 GMT
- Title: Masked Autoencoders for Generic Event Boundary Detection CVPR'2022
Kinetics-GEBD Challenge
- Authors: Rui He, Yuanxi Sun, Youzeng Li, Zuwei Huang, Feng Hu, Xu Cheng, Jie
Tang
- Abstract summary: Generic Event Boundary Detection (GEBD) tasks aim at detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
In this paper, we apply Masked Autoencoders to improve algorithm performance on the GEBD tasks.
With our approach, we achieved 85.94% on the F1-score on the Kinetics-GEBD test set, which improved the F1-score by 2.31% compared to the winner of the 2021 Kinetics-GEBD Challenge.
- Score: 11.823891739821443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generic Event Boundary Detection (GEBD) tasks aim at detecting generic,
taxonomy-free event boundaries that segment a whole video into chunks. In this
paper, we apply Masked Autoencoders to improve algorithm performance on the
GEBD tasks. Our approach mainly adopted the ensemble of Masked Autoencoders
fine-tuned on the GEBD task as a self-supervised learner with other base
models. Moreover, we also use a semi-supervised pseudo-label method to take
full advantage of the abundant unlabeled Kinetics-400 data while training. In
addition, we propose a soft-label method to partially balance the positive and
negative samples and alleviate the problem of ambiguous labeling in this task.
Lastly, a tricky segmentation alignment policy is implemented to refine
boundaries predicted by our models to more accurate locations. With our
approach, we achieved 85.94% on the F1-score on the Kinetics-GEBD test set,
which improved the F1-score by 2.31% compared to the winner of the 2021
Kinetics-GEBD Challenge. Our code is available at
https://github.com/ContentAndMaterialPortrait/MAE-GEBD.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection [1.3695134621603882]
Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events.
Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space.
We propose FlowGEBD, a non-parametric, unsupervised technique for GEBD.
arXiv Detail & Related papers (2024-02-15T14:49:15Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - MAE-GEBD:Winning the CVPR'2023 LOVEU-GEBD Challenge [11.823891739821443]
We build a model for segmenting videos into segments by detecting general event boundaries applicable to various classes.
Based on last year's MAE-GEBD method, we have improved our model performance on the GEBD task by adjusting the data processing strategy and loss function.
With our method, we achieve an F1 score of 86.03% on the Kinetics-GEBD test set, which is a 0.09% improvement in the F1 score compared to our 2022 Kinetics-GEBD method.
arXiv Detail & Related papers (2023-06-27T02:35:19Z) - The Second-place Solution for CVPR VISION 23 Challenge Track 1 -- Data
Effificient Defect Detection [3.4853769431047907]
The Vision Challenge Track 1 for Data-Effificient Defect Detection requires competitors to instance segment 14 industrial inspection datasets in a data-defificient setting.
This report introduces the technical details of the team Aoi-overfifitting-Team for this challenge.
arXiv Detail & Related papers (2023-06-25T03:37:02Z) - The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation
via Point-Guided Mask Representation [61.027468209465354]
We introduce a novel learning scheme named weakly semi-supervised instance segmentation (WSSIS) with point labels.
We propose a method for WSSIS that can effectively leverage the budget-friendly point labels as a powerful weak supervision source.
We conduct extensive experiments on COCO and BDD100K datasets, and the proposed method achieves promising results comparable to those of the fully-supervised model.
arXiv Detail & Related papers (2023-03-27T10:11:22Z) - Robust Target Training for Multi-Source Domain Adaptation [110.77704026569499]
We propose a novel Bi-level Optimization based Robust Target Training (BORT$2$) method for MSDA.
Our proposed method achieves the state of the art performance on three MSDA benchmarks, including the large-scale DomainNet dataset.
arXiv Detail & Related papers (2022-10-04T15:20:01Z) - Submission to Generic Event Boundary Detection Challenge@CVPR 2022:
Local Context Modeling and Global Boundary Decoding Approach [46.97359231258202]
Generic event boundary detection (GEBD) is an important yet challenging task in video understanding.
We present a local context modeling and global boundary decoding approach for GEBD task.
arXiv Detail & Related papers (2022-06-30T13:19:53Z) - Winning the CVPR'2021 Kinetics-GEBD Challenge: Contrastive Learning
Approach [27.904987752334314]
We introduce a novel contrastive learning based approach to deal with the Generic Event Boundary Detection task.
In our model, Temporal Self-similarity Matrix (TSM) is utilized as an intermediate representation which takes on a role as an information bottleneck.
arXiv Detail & Related papers (2021-06-22T05:21:59Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.