Fine-grained Dynamic Network for Generic Event Boundary Detection
- URL: http://arxiv.org/abs/2407.04274v1
- Date: Fri, 5 Jul 2024 06:02:46 GMT
- Title: Fine-grained Dynamic Network for Generic Event Boundary Detection
- Authors: Ziwei Zheng, Lijun He, Le Yang, Fan Li,
- Abstract summary: We propose a novel dynamic pipeline for generic event boundaries named DyBDet.
By introducing a multi-exit network architecture, DyBDet automatically learns the allocation to different video snippets.
Experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks.
- Score: 9.17191007695011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their distinctive characteristics and detection difficulties, resulting in suboptimal performance. Intuitively, a more intelligent and reasonable way is to adaptively detect boundaries by considering their special properties. In light of this, we propose a novel dynamic pipeline for generic event boundaries named DyBDet. By introducing a multi-exit network architecture, DyBDet automatically learns the subnet allocation to different video snippets, enabling fine-grained detection for various boundaries. Besides, a multi-order difference detector is also proposed to ensure generic boundaries can be effectively identified and adaptively processed. Extensive experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks, leading to obvious improvements in both performance and efficiency compared to the current state-of-the-art.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time.
Most face forgery recognition methods can only process one face at a time.
We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z) - Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain.
Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z) - Motion Aware Self-Supervision for Generic Event Boundary Detection [14.637933739152315]
Generic Event Boundary Detection (GEBD) aims to detect moments in videos that are naturally perceived by humans as generic and taxonomy-free event boundaries.
Existing approaches involve very complex and sophisticated pipelines in terms of architectural design choices.
We revisit a simple and effective self-supervised method and augment it with a differentiable motion feature learning module to tackle the spatial and temporal diversities in the GEBD task.
arXiv Detail & Related papers (2022-10-11T16:09:13Z) - Multimodal Graph Learning for Deepfake Detection [10.077496841634135]
Existing deepfake detectors face several challenges in achieving robustness and generalization.
We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities.
Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
arXiv Detail & Related papers (2022-09-12T17:17:49Z) - Temporal Perceiver: A General Architecture for Arbitrary Boundary
Detection [48.33132632418303]
Generic Boundary Detection (GBD) aims at locating general boundaries that divide videos into semantically coherent and taxonomy-free units.
Previous research separately handle these different-level generic boundaries with specific designs of complicated deep networks from simple CNN to LSTM.
We present Temporal Perceiver, a general architecture with Transformers, offering a unified solution to the detection of arbitrary generic boundaries.
arXiv Detail & Related papers (2022-03-01T09:31:30Z) - Progressive Attention on Multi-Level Dense Difference Maps for Generic
Event Boundary Detection [35.16241630620967]
Generic event boundary detection is an important yet challenging task in video understanding.
This paper presents an effective and end-to-end learnable framework (DDM-Net) to tackle the diversity and complicated semantics of event boundaries.
arXiv Detail & Related papers (2021-12-09T09:00:05Z) - UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event
Boundary Detection [27.29169136392871]
Generic Event Boundary Detection (GEBD) aims to find one level deeper semantic boundaries of events.
We propose a novel framework for unsupervised/supervised GEBD, using the Temporal Self-similarity Matrix (TSM) as the video representation.
Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin.
arXiv Detail & Related papers (2021-11-29T18:50:39Z) - Robust Facial Landmark Detection by Multi-order Multi-constraint Deep
Networks [35.19368350816032]
We propose a Multi-order Multi-constraint Deep Network (MMDN) for more powerful feature correlations and shape constraints learning.
An Implicit Multi-order Correlating Geometry-aware (IMCG) model is proposed to introduce the multi-order spatial correlations and multi-order channel correlations.
An Explicit Probability-based Boundary-adaptive Regression (EPBR) method is developed to enhance the global shape constraints.
arXiv Detail & Related papers (2020-12-09T09:11:47Z) - Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation [62.29076080124199]
This paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection.
At the coarse-grained stage, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions.
At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains.
arXiv Detail & Related papers (2020-03-23T13:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.