Related papers: Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection

Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection

URL: http://arxiv.org/abs/2503.23181v1
Date: Sat, 29 Mar 2025 18:33:58 GMT
Title: Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection
Authors: Sunoh Kim, Daeho Um,
Abstract summary: Weakly supervised video grounding aims to localize temporal boundaries relevant to a given query without explicit ground-truth temporal boundaries.<n>We introduce novel boundary prediction methods to capture diverse boundaries from multiple Gaussians.<n>We also introduce new selection methods that take proposal quality into account.
Score: 2.1592777170316375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weakly supervised video grounding aims to localize temporal boundaries relevant to a given query without explicit ground-truth temporal boundaries. While existing methods primarily use Gaussian-based proposals, they overlook the importance of (1) boundary prediction and (2) top-1 prediction selection during inference. In their boundary prediction, boundaries are simply set at half a standard deviation away from a Gaussian mean on both sides, which may not accurately capture the optimal boundaries. In the top-1 prediction process, these existing methods rely heavily on intersections with other proposals, without considering the varying quality of each proposal. To address these issues, we explore various inference strategies by introducing (1) novel boundary prediction methods to capture diverse boundaries from multiple Gaussians and (2) new selection methods that take proposal quality into account. Extensive experiments on the ActivityNet Captions and Charades-STA datasets validate the effectiveness of our inference strategies, demonstrating performance improvements without requiring additional training.

Related papers

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z)
Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference [50.95521705711802]
Previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model. This paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference. We propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect.
arXiv Detail & Related papers (2024-04-30T15:20:41Z)
Predictive Inference in Multi-environment Scenarios [18.324321417099394]
We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We extend the jackknife and split-conformal methods to show how to obtain distribution-free coverage in non-traditional, potentially hierarchical data-generating scenarios. Our contributions also include extensions for settings with non-real-valued responses, a theory of consistency for predictive inference in these general problems, and insights on the limits of conditional coverage.
arXiv Detail & Related papers (2024-03-25T00:21:34Z)
Transductive Active Learning: Theory and Applications [35.49225932333298]
We study a generalization of classical active learning to real-world settings with concrete prediction targets.<n>We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets.
arXiv Detail & Related papers (2024-02-13T09:22:45Z)
Distribution-Free Conformal Joint Prediction Regions for Neural Marked Temporal Point Processes [4.324839843326325]
We develop more reliable methods for uncertainty in neural TPP models via the framework of conformal prediction. A primary objective is to generate a distribution-free joint prediction region for an event's arrival time and mark, with a finite-sample marginal coverage guarantee.
arXiv Detail & Related papers (2024-01-09T15:28:29Z)
Bridging the Gap Between Multi-Step and One-Shot Trajectory Prediction via Self-Supervision [2.365702128814616]
Accurate vehicle trajectory prediction is an unsolved problem in autonomous driving. We propose a middle-ground where multiple trajectory segments are chained together. Our proposed Multi-Branch Self-Supervised Predictor receives additional training on new predictions starting at intermediate future segments.
arXiv Detail & Related papers (2023-06-06T02:46:28Z)
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding. We propose the first purely anchor-free temporal localization method. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z)
Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z)
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization [62.21716612888669]
We propose two generic methods for improving semi-supervised learning (SSL) The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods. The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR)
arXiv Detail & Related papers (2020-12-03T09:49:35Z)
Boundary Uncertainty in a Single-Stage Temporal Action Localization Network [12.364819165688628]
We show that with both uncertainty modeling approaches improve the detection performance by more than $1.5%$ in mAP@tIoU=0.5. The proposed simple one-stage network performs closely to more complex one and two stage networks.
arXiv Detail & Related papers (2020-08-25T17:04:39Z)
DeepStrip: High Resolution Boundary Refinement [60.00241966809684]
We propose to convert regions of interest into strip images and compute a boundary prediction in the strip domain. To detect the target boundary, we present a framework with two prediction layers. We enforce a matching consistency and C0 continuity regularization to the network to reduce false alarms.
arXiv Detail & Related papers (2020-03-25T22:44:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.