MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection
- URL: http://arxiv.org/abs/2502.13859v1
- Date: Wed, 19 Feb 2025 16:27:23 GMT
- Title: MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection
- Authors: Shuyong Gao, Yu'ang Feng, Qishan Wang, Lingyi Hong, Xinyu Zhou, Liu Fei, Yan Wang, Wenqiang Zhang,
- Abstract summary: Video Camouflaged Object Detection (VCOD) is a challenging task which aims to identify objects that seamlessly concealed within the background in videos.
We construct a new large-scale multi-domain VCOD dataset MSVCOD.
Our MSVCOD is the largest VCOD dataset to date, introducing multiple object categories including human, animal, medical, and vehicle objects for the first time.
Our framework achieves state-of-the-art results on the existing VCOD animal dataset and the proposed MSVCOD.
- Score: 23.59587900985667
- License:
- Abstract: Video Camouflaged Object Detection (VCOD) is a challenging task which aims to identify objects that seamlessly concealed within the background in videos. The dynamic properties of video enable detection of camouflaged objects through motion cues or varied perspectives. Previous VCOD datasets primarily contain animal objects, limiting the scope of research to wildlife scenarios. However, the applications of VCOD extend beyond wildlife and have significant implications in security, art, and medical fields. Addressing this problem, we construct a new large-scale multi-domain VCOD dataset MSVCOD. To achieve high-quality annotations, we design a semi-automatic iterative annotation pipeline that reduces costs while maintaining annotation accuracy. Our MSVCOD is the largest VCOD dataset to date, introducing multiple object categories including human, animal, medical, and vehicle objects for the first time, while also expanding background diversity across various environments. This expanded scope increases the practical applicability of the VCOD task in camouflaged object detection. Alongside this dataset, we introduce a one-steam video camouflage object detection model that performs both feature extraction and information fusion without additional motion feature fusion modules. Our framework achieves state-of-the-art results on the existing VCOD animal dataset and the proposed MSVCOD. The dataset and code will be made publicly available.
Related papers
- Green Video Camouflaged Object Detection [28.528114525671025]
We propose a green VCOD method named GreenVCOD to handle temporal information.
Built upon a green ICOD method, GreenVCOD uses long- and short-term temporal neighborhoods to capture joint spatial/temporal context information.
Experimental results show that GreenVCOD offers competitive performance compared to state-of-the-art VCOD benchmarks.
arXiv Detail & Related papers (2025-01-19T01:42:00Z) - Unconstrained Salient and Camouflaged Object Detection [4.698538612738126]
We introduce a benchmark called Unconstrained Salient and Camouflaged Object Detection (USCOD)
USCOD supports the simultaneous detection of salient and camouflaged objects in unconstrained scenes, regardless of their presence.
To address this challenge, we propose USCNet, a baseline model for USCOD that decouples the learning of attribute distinction from mask reconstruction.
arXiv Detail & Related papers (2024-12-14T19:37:17Z) - FADE: A Dataset for Detecting Falling Objects around Buildings in Video [75.48118923174712]
Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert.
FADE contains 1,881 videos from 18 scenes, featuring 8 falling object categories, 4 weather conditions, and 4 video resolutions.
We develop a new object detection method called FADE-Net, which effectively leverages motion information.
arXiv Detail & Related papers (2024-08-11T11:43:56Z) - Camouflaged Image Synthesis Is All You Need to Boost Camouflaged
Detection [65.8867003376637]
We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes.
Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models.
Our framework outperforms the current state-of-the-art method on three datasets.
arXiv Detail & Related papers (2023-08-13T06:55:05Z) - Camouflaged Object Detection with Feature Grafting and Distractor Aware [9.791590363932519]
We propose a novel Feature Grafting and Distractor Aware network (FDNet) to handle the Camouflaged Object Detection task.
Specifically, we use CNN and Transformer to encode multi-scale images in parallel.
A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map.
arXiv Detail & Related papers (2023-07-08T09:37:08Z) - CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes.
We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes.
Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z) - MFFN: Multi-view Feature Fusion Network for Camouflaged Object Detection [10.04773536815808]
We propose a behavior-inspired framework, called Multi-view Feature Fusion Network (MFFN), which mimics the human behaviors of finding indistinct objects in images.
MFFN captures critical edge and semantic information by comparing and fusing extracted multi-view features.
Our method performs favorably against existing state-of-the-art methods via training with the same data.
arXiv Detail & Related papers (2022-10-12T16:12:58Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.
In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD)
We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z) - Concealed Object Detection [140.98738087261887]
We present the first systematic study on concealed object detection (COD)
COD aims to identify objects that are "perfectly" embedded in their background.
To better understand this task, we collect a large-scale dataset called COD10K.
arXiv Detail & Related papers (2021-02-20T06:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.