Related papers: Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization

Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization

URL: http://arxiv.org/abs/2412.19418v1
Date: Fri, 27 Dec 2024 03:04:57 GMT
Title: Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization
Authors: Yuanpeng He, Lijian Li, Tianxiang Zhan, Wenpin Jiao, Chi-Man Pun,
Abstract summary: Weakly supervised temporal action localization (WS-TAL) is a task of targeting at localizing complete action instances and categorizing them with video-level labels.<n> Action-background ambiguity, primarily caused by background noise resulting from aggregation and intra-action variation, is a significant challenge for existing WS-TAL methods.<n>We introduce a hybrid multi-head attention (HMHA) module and generalized uncertainty-based evidential fusion (GUEF) module to address the problem.
Score: 28.005080560540133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised temporal action localization (WS-TAL) is a task of targeting at localizing complete action instances and categorizing them with video-level labels. Action-background ambiguity, primarily caused by background noise resulting from aggregation and intra-action variation, is a significant challenge for existing WS-TAL methods. In this paper, we introduce a hybrid multi-head attention (HMHA) module and generalized uncertainty-based evidential fusion (GUEF) module to address the problem. The proposed HMHA effectively enhances RGB and optical flow features by filtering redundant information and adjusting their feature distribution to better align with the WS-TAL task. Additionally, the proposed GUEF adaptively eliminates the interference of background noise by fusing snippet-level evidences to refine uncertainty measurement and select superior foreground feature information, which enables the model to concentrate on integral action instances to achieve better action localization and classification performance. Experimental results conducted on the THUMOS14 dataset demonstrate that our method outperforms state-of-the-art methods. Our code is available in \url{https://github.com/heyuanpengpku/GUEF/tree/main}.

Related papers

Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization [11.10178274806454]
We propose a form of weak supervision that improves the annotation efficiency and detection performance.<n>We re-annotated mainstream IML datasets with scribble labels and propose the first scribble-based IML dataset.<n>We employ self-supervised training with a structural consistency loss to encourage the model to produce consistent predictions.
arXiv Detail & Related papers (2025-07-17T11:45:27Z)
Backscatter Device-aided Integrated Sensing and Communication: A Pareto Optimization Framework [59.30060797118097]
Integrated sensing and communication (ISAC) systems potentially encounter significant performance degradation in densely obstructed urban non-line-of-sight scenarios.<n>This paper proposes a backscatter approximation (BD)-assisted ISAC system, which leverages passive BDs naturally distributed in environments of enhancement.
arXiv Detail & Related papers (2025-07-12T17:11:06Z)
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM [2.0781167019314806]
We propose a neural descriptors-based adaptive noise filtering strategy for SLAM, named ADA-DPM.<n>To tackle dynamic object interference, we design the Dynamic Head to predict and filter out dynamic feature points.<n> Secondly, to mitigate the impact of noise and unstructured feature points, we propose the Global Importance Scoring Head.<n>Finally, experimental validations on multiple public datasets confirm the effectiveness of ADA-DPM.
arXiv Detail & Related papers (2025-06-22T12:48:11Z)
CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification [46.89908887119571]
Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions. We propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification.
arXiv Detail & Related papers (2025-03-16T08:04:17Z)
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z)
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization. The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z)
FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background [9.970265640589966]
Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios. We propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages. Our experimental results demonstrate the state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2024-07-12T15:57:52Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
Sparse Global Matching for Video Frame Interpolation with Large Motion [20.49084881829404]
Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. We introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion.
arXiv Detail & Related papers (2024-04-10T11:06:29Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection [40.532331552038485]
We present a novel Amplitude-Modulated Perturbation and Vortex Convolutional Network, AMSP-UOD. AMSP-UOD addresses the impact of non-ideal imaging factors on detection accuracy in complex underwater environments. Our method outperforms existing state-of-the-art methods in terms of accuracy and noise immunity.
arXiv Detail & Related papers (2023-08-23T05:03:45Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation. This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z)
Towards Robust Adaptive Object Detection under Noisy Annotations [40.25050610617893]
Existing methods assume that the source domain labels are completely clean, yet large-scale datasets often contain error-prone annotations due to instance ambiguity. We propose a Noise Latent Transferability Exploration framework to address this issue. NLTE improves the mAP by 8.4% under 60% corrupted annotations and even approaches the ideal upper bound of training on a clean source dataset.
arXiv Detail & Related papers (2022-04-06T07:02:37Z)
Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM) DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.