Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation
- URL: http://arxiv.org/abs/2410.13585v1
- Date: Thu, 17 Oct 2024 14:21:22 GMT
- Title: Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation
- Authors: Kuan-Ying Lee, Qian Zhou, Klara Nahrstedt,
- Abstract summary: We propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets.
By training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model's accuracy in the target domain.
- Score: 8.21260979799828
- License:
- Abstract: Multi-camera systems are indispensable in movies, TV shows, and other media. Selecting the appropriate camera at every timestamp has a decisive impact on production quality and audience preferences. Learning-based view recommendation frameworks can assist professionals in decision-making. However, they often struggle outside of their training domains. The scarcity of labeled multi-camera view recommendation datasets exacerbates the issue. Based on the insight that many videos are edited from the original multi-camera videos, we propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets. Promisingly, by training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model's accuracy in the target domain and bridge the accuracy gap between in-domain and never-before-seen domains.
Related papers
- Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - DVOS: Self-Supervised Dense-Pattern Video Object Segmentation [6.092973123903838]
In Dense Video Object (DVOS) scenarios, each video frame encompasses hundreds of small, dense and partially occluded objects.
We propose a semi-self-temporal approach for DVOS utilizing a diffusion-based method through multi-task learning.
To demonstrate the utility and efficacy of the proposed approach, we developed DVOS models for wheat head segmentation of handheld and drone-captured videos.
arXiv Detail & Related papers (2024-06-07T17:58:36Z) - Camera-Driven Representation Learning for Unsupervised Domain Adaptive
Person Re-identification [33.25577310265293]
We introduce a camera-driven curriculum learning framework that leverages camera labels to transfer knowledge from source to target domains progressively.
For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way.
We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs.
arXiv Detail & Related papers (2023-08-23T04:01:56Z) - Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows [83.54243912535667]
We first collect a novel benchmark on this setting with four diverse scenarios including concerts, sports games, gala shows, and contests.
It contains 88-hour raw videos that contribute to the 14-hour edited videos.
We propose a new approach temporal and contextual transformer that utilizes clues from historical shots and other views to make shot transition decisions.
arXiv Detail & Related papers (2022-10-17T04:11:23Z) - Domain Adaptive Video Segmentation via Temporal Pseudo Supervision [46.38660541271893]
Video semantic segmentation can mitigate data labelling constraints by adapting from a labelled source domain toward an unlabelled target domain.
We design temporal pseudo supervision (TPS), a simple and effective method that explores the idea of consistency training for representations effective from target videos.
We show that TPS is simpler to implement, much more stable to train, and achieves superior video accuracy as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-06T00:36:14Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - Multiview Pseudo-Labeling for Semi-supervised Learning from Video [102.36355560553402]
We present a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.
Our method capitalizes on multiple views, but it nonetheless trains a model that is shared across appearance and motion input.
On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
arXiv Detail & Related papers (2021-04-01T17:59:48Z) - DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World
Domain Adaptation of Semantic Segmentation [9.984696742463628]
This work presents a new multi-domain dataset datasetnamefor benchmarking domain adaptation techniques on in-the-wild road-scene videos collected from the Internet.
The dataset consists of pixel-level annotations for 100 videos selected to cover diverse scenes/domains based on two criteria; human subjective judgment and an anomaly score judged using an existing road-scene dataset.
arXiv Detail & Related papers (2021-01-30T04:43:22Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in
Video-Based Face Recognition [8.220945563455848]
A new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras.
The proposed metric learning technique is used to train deep Siamese networks under different training scenarios.
arXiv Detail & Related papers (2020-02-11T05:06:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.