Related papers: Context Consistency Learning via Sentence Removal for Semi-Supervised Video Paragraph Grounding

Context Consistency Learning via Sentence Removal for Semi-Supervised Video Paragraph Grounding

URL: http://arxiv.org/abs/2506.18476v1
Date: Mon, 23 Jun 2025 10:22:46 GMT
Title: Context Consistency Learning via Sentence Removal for Semi-Supervised Video Paragraph Grounding
Authors: Yaokun Zhong, Siyu Jiang, Jian Zhu, Jian-Fang Hu,
Abstract summary: We propose a novel Context Consistency Learning (CCL) framework to enhance semi-supervised learning.<n>CCL unifies the paradigms of consistency regularization and pseudo-labeling to enhance semi-supervised learning.
Score: 9.280423086981703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semi-Supervised Video Paragraph Grounding (SSVPG) aims to localize multiple sentences in a paragraph from an untrimmed video with limited temporal annotations. Existing methods focus on teacher-student consistency learning and video-level contrastive loss, but they overlook the importance of perturbing query contexts to generate strong supervisory signals. In this work, we propose a novel Context Consistency Learning (CCL) framework that unifies the paradigms of consistency regularization and pseudo-labeling to enhance semi-supervised learning. Specifically, we first conduct teacher-student learning where the student model takes as inputs strongly-augmented samples with sentences removed and is enforced to learn from the adequately strong supervisory signals from the teacher model. Afterward, we conduct model retraining based on the generated pseudo labels, where the mutual agreement between the original and augmented views' predictions is utilized as the label confidence. Extensive experiments show that CCL outperforms existing methods by a large margin.

Related papers

Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition [38.74388860692423]
We propose a novel approach, Semantic-guided fine-tuning of foundation model for long-tailed visual recognition (Sage)<n>We introduce an SG-Adapter that integrates class descriptions as semantic guidance to guide the fine-tuning of the visual encoder.<n>Experiments on benchmark datasets demonstrate the effectiveness of the proposed Sage in enhancing performance in long-tailed learning.
arXiv Detail & Related papers (2025-07-17T05:47:19Z)
Rethinking the Mean Teacher Strategy from the Perspective of Self-paced Learning [5.6818939992896365]
Semi-supervised medical image segmentation has attracted significant attention due to its potential to reduce manual annotation costs.<n>In this work, we reinterpret the MT strategy on supervised data as a form of self-paced learning, regulated by the output agreement between the temporally lagged teacher model and the ground truth labels.
arXiv Detail & Related papers (2025-05-16T09:14:06Z)
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization [129.43937834515688]
We propose a new COllaborative Temporal consistEncy Learning (COTEL) framework to strengthen the video-language alignment.<n>Specifically, we first design a frame- and a segment-level Temporal Consistency Learning (TCL) module that models semantic alignment across frame saliencies and sentence-moment pairs.
arXiv Detail & Related papers (2025-03-22T05:04:12Z)
Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis.<n>We introduce two prompting strategies to semantically enhance unlabeled text.<n> Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.<n>We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.<n>We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding [70.31050639330603]
Video paragraph grounding aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. Existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. We introduce and explore Weakly-Supervised Video paragraph Grounding (WSVPG) to eliminate the need of temporal annotations.
arXiv Detail & Related papers (2024-03-18T04:30:31Z)
Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos [134.78406021194985]
We focus on the weakly supervised setting of this task that merely accesses to coarse video-level language description annotation without temporal boundary. We propose a emphBoundary Adaptive Refinement (BAR) framework that resorts to reinforcement learning to guide the process of progressively refining the temporal boundary.
arXiv Detail & Related papers (2020-09-18T03:32:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.