Related papers: Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding

Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding

URL: http://arxiv.org/abs/2508.06317v1
Date: Fri, 08 Aug 2025 13:47:00 GMT
Title: Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding
Authors: Jian Hu, Zixu Cheng, Shaogang Gong, Isabel Guan, Jianye Hao, Jun Wang, Kun Shao,
Abstract summary: Video Temporal Grounding aims to temporally locate video segments matching a natural language description in a long video.<n>We introduce a Data-Efficient Unlabelled Cross-domain Temporal Grounding method.<n>This method eliminates the need for target annotation and keeps both computational and storage overhead low enough to run in real time.
Score: 59.09971455857609
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Video Temporal Grounding (TG) aims to temporally locate video segments matching a natural language description (a query) in a long video. While Vision-Language Models (VLMs) are effective at holistic semantic matching, they often struggle with fine-grained temporal localisation. Recently, Group Relative Policy Optimisation (GRPO) reformulates the inference process as a reinforcement learning task, enabling fine-grained grounding and achieving strong in-domain performance. However, GRPO relies on labelled data, making it unsuitable in unlabelled domains. Moreover, because videos are large and expensive to store and process, performing full-scale adaptation introduces prohibitive latency and computational overhead, making it impractical for real-time deployment. To overcome both problems, we introduce a Data-Efficient Unlabelled Cross-domain Temporal Grounding method, from which a model is first trained on a labelled source domain, then adapted to a target domain using only a small number of unlabelled videos from the target domain. This approach eliminates the need for target annotation and keeps both computational and storage overhead low enough to run in real time. Specifically, we introduce. Uncertainty-quantified Rollout Policy Adaptation (URPA) for cross-domain knowledge transfer in learning video temporal grounding without target labels. URPA generates multiple candidate predictions using GRPO rollouts, averages them to form a pseudo label, and estimates confidence from the variance across these rollouts. This confidence then weights the training rewards, guiding the model to focus on reliable supervision. Experiments on three datasets across six cross-domain settings show that URPA generalises well using only a few unlabelled target videos. Codes will be released once published.

Related papers

Multi-Prompt Progressive Alignment for Multi-Source Unsupervised Domain Adaptation [73.40696661117408]
We propose a progressive alignment strategy for adapting CLIP to unlabeled downstream task.<n>We name our approach MP2A and test it on three popular UDA benchmarks, namely ImageCLEF, Office-Home, and the most challenging DomainNet.<n> Experiments showcase that MP2A achieves state-of-the-art performance when compared with most recent CLIP-based MS-UDA approaches.
arXiv Detail & Related papers (2025-07-31T09:42:42Z)
CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples. Existing methods in video action recognition rely on large labeled datasets from the same domain. We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z)
Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains [46.26074225989355]
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments. In this work, we focus on FewShot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos. We propose a new FSDA-AR using five established datasets considering the adaptation on more diverse and challenging domains.
arXiv Detail & Related papers (2023-05-15T08:01:05Z)
Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation [117.39092621796753]
Source Domain Adaptation (SFDA) setup aims to adapt a source-trained model to the target domain without accessing source data. A novel method that takes full advantage of correlations oftemporal-information to tackle the absence of source data is proposed. Experiments show that PixelL achieves un-of-the-art performance on benchmarks compared to current UDA and SFDA approaches.
arXiv Detail & Related papers (2023-03-25T05:06:23Z)
Simplifying Open-Set Video Domain Adaptation with Contrastive Learning [16.72734794723157]
unsupervised video domain adaptation methods have been proposed to adapt a predictive model from a labelled dataset to an unlabelled dataset. We address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains "unknown" semantic categories that are not shared with the source. We propose a video-oriented temporal contrastive loss that enables our method to better cluster the feature space by exploiting the freely available temporal information in video data.
arXiv Detail & Related papers (2023-01-09T13:16:50Z)
Cross-Domain Video Anomaly Detection without Target Domain Adaptation [38.823721272155616]
Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. This requires laborious model-tuning by the end-user who may prefer to have a system that works out-of-the-box"
arXiv Detail & Related papers (2022-12-14T03:48:00Z)
Domain Adaptive Video Semantic Segmentation via Cross-Domain Moving Object Mixing [15.823918683848877]
We propose Cross-domain Moving Object Mixing (CMOM) that cuts several objects, including hard-to-transfer classes, in the source domain video clip. Unlike image-level domain adaptation, the temporal context should be maintained to mix moving objects in two different videos. We additionally propose Feature Alignment with Temporal Context (FATC) to enhance target domain feature discriminability.
arXiv Detail & Related papers (2022-11-04T08:10:33Z)
Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data. CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z)
Self-supervised Autoregressive Domain Adaptation for Time Series Data [9.75443057146649]
Unsupervised domain adaptation (UDA) has successfully addressed the domain shift problem for visual applications. These approaches may have limited performance for time series data due to the following reasons. We propose a Self-supervised Autoregressive Domain Adaptation (SLARDA) framework to address these limitations.
arXiv Detail & Related papers (2021-11-29T08:17:23Z)
Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials. We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field. Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z)
Unsupervised and self-adaptative techniques for cross-domain person re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task. Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation. In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.