Submission to Generic Event Boundary Detection Challenge@CVPR 2022:
Local Context Modeling and Global Boundary Decoding Approach
- URL: http://arxiv.org/abs/2206.15268v1
- Date: Thu, 30 Jun 2022 13:19:53 GMT
- Title: Submission to Generic Event Boundary Detection Challenge@CVPR 2022:
Local Context Modeling and Global Boundary Decoding Approach
- Authors: Jiaqi Tang, Zhaoyang Liu, Jing Tan, Chen Qian, Wayne Wu, Limin Wang
- Abstract summary: Generic event boundary detection (GEBD) is an important yet challenging task in video understanding.
We present a local context modeling and global boundary decoding approach for GEBD task.
- Score: 46.97359231258202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generic event boundary detection (GEBD) is an important yet challenging task
in video understanding, which aims at detecting the moments where humans
naturally perceive event boundaries. In this paper, we present a local context
modeling and global boundary decoding approach for GEBD task. Local context
modeling sub-network is proposed to perceive diverse patterns of generic event
boundaries, and it generates powerful video representations and reliable
boundary confidence. Based on them, global boundary decoding sub-network is
exploited to decode event boundaries from a global view. Our proposed method
achieves 85.13% F1-score on Kinetics-GEBD testing set, which achieves a more
than 22% F1-score boost compared to the baseline method. The code is available
at https://github.com/JackyTown/GEBD_Challenge_CVPR2022.
Related papers
- Fine-grained Dynamic Network for Generic Event Boundary Detection [9.17191007695011]
We propose a novel dynamic pipeline for generic event boundaries named DyBDet.
By introducing a multi-exit network architecture, DyBDet automatically learns the allocation to different video snippets.
Experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks.
arXiv Detail & Related papers (2024-07-05T06:02:46Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Exploiting Context Information for Generic Event Boundary Captioning [51.53874954616367]
Generic Event Boundary Captioning (GEBC) aims to generate three sentences describing the status change for a given time boundary.
To tackle this issue, we design a model that directly takes the whole video as input and generates captions for all boundaries parallelly.
arXiv Detail & Related papers (2022-07-03T14:14:54Z) - Masked Autoencoders for Generic Event Boundary Detection CVPR'2022
Kinetics-GEBD Challenge [11.823891739821443]
Generic Event Boundary Detection (GEBD) tasks aim at detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
In this paper, we apply Masked Autoencoders to improve algorithm performance on the GEBD tasks.
With our approach, we achieved 85.94% on the F1-score on the Kinetics-GEBD test set, which improved the F1-score by 2.31% compared to the winner of the 2021 Kinetics-GEBD Challenge.
arXiv Detail & Related papers (2022-06-17T08:10:27Z) - Temporal Perceiver: A General Architecture for Arbitrary Boundary
Detection [48.33132632418303]
Generic Boundary Detection (GBD) aims at locating general boundaries that divide videos into semantically coherent and taxonomy-free units.
Previous research separately handle these different-level generic boundaries with specific designs of complicated deep networks from simple CNN to LSTM.
We present Temporal Perceiver, a general architecture with Transformers, offering a unified solution to the detection of arbitrary generic boundaries.
arXiv Detail & Related papers (2022-03-01T09:31:30Z) - UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event
Boundary Detection [27.29169136392871]
Generic Event Boundary Detection (GEBD) aims to find one level deeper semantic boundaries of events.
We propose a novel framework for unsupervised/supervised GEBD, using the Temporal Self-similarity Matrix (TSM) as the video representation.
Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin.
arXiv Detail & Related papers (2021-11-29T18:50:39Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Winning the CVPR'2021 Kinetics-GEBD Challenge: Contrastive Learning
Approach [27.904987752334314]
We introduce a novel contrastive learning based approach to deal with the Generic Event Boundary Detection task.
In our model, Temporal Self-similarity Matrix (TSM) is utilized as an intermediate representation which takes on a role as an information bottleneck.
arXiv Detail & Related papers (2021-06-22T05:21:59Z) - The Devil is in the Boundary: Exploiting Boundary Representation for
Basis-based Instance Segmentation [85.153426159438]
We propose Basis based Instance(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods.
Our B2Inst leads to consistent improvements and accurately parses out the instance boundaries in a scene.
arXiv Detail & Related papers (2020-11-26T11:26:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.