A Novel Long-term Iterative Mining Scheme for Video Salient Object
Detection
- URL: http://arxiv.org/abs/2206.09564v1
- Date: Mon, 20 Jun 2022 04:27:47 GMT
- Title: A Novel Long-term Iterative Mining Scheme for Video Salient Object
Detection
- Authors: Chenglizhao Chen and Hengsen Wang and Yuming Fang and Chong Peng
- Abstract summary: Short-term methodology conflicts with the real mechanism of our visual system.
This paper proposes a novel VSOD approach, which performs VSOD in a complete long-term way.
The proposed approach outperforms almost all SOTA models on five widely used benchmark datasets.
- Score: 54.53335983750033
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The existing state-of-the-art (SOTA) video salient object detection (VSOD)
models have widely followed short-term methodology, which dynamically
determines the balance between spatial and temporal saliency fusion by solely
considering the current consecutive limited frames. However, the short-term
methodology has one critical limitation, which conflicts with the real
mechanism of our visual system -- a typical long-term methodology. As a result,
failure cases keep showing up in the results of the current SOTA models, and
the short-term methodology becomes the major technical bottleneck. To solve
this problem, this paper proposes a novel VSOD approach, which performs VSOD in
a complete long-term way. Our approach converts the sequential VSOD, a
sequential task, to a data mining problem, i.e., decomposing the input video
sequence to object proposals in advance and then mining salient object
proposals as much as possible in an easy-to-hard way. Since all object
proposals are simultaneously available, the proposed approach is a complete
long-term approach, which can alleviate some difficulties rooted in
conventional short-term approaches. In addition, we devised an online updating
scheme that can grasp the most representative and trustworthy pattern profile
of the salient objects, outputting framewise saliency maps with rich details
and smoothing both spatially and temporally. The proposed approach outperforms
almost all SOTA models on five widely used benchmark datasets.
Related papers
- OED: Towards One-stage End-to-End Dynamic Scene Graph Generation [18.374354844446962]
Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos.
We propose a one-stage end-to-end framework, termed OED, which streamlines the DSGG pipeline.
This framework reformulates the task as a set prediction problem and leverages pair-wise features to represent each subject-object pair within the scene graph.
arXiv Detail & Related papers (2024-05-27T08:18:41Z) - Multi-Scene Generalized Trajectory Global Graph Solver with Composite
Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association.
In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction.
Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z) - Reasonable Anomaly Detection in Long Sequences [3.673497128866642]
We propose to completely represent the motion patterns of objects by learning from long-term sequences.
A Stacked State Machine (SSM) model is proposed to represent the temporal dependencies which are consistent across long-range observations.
arXiv Detail & Related papers (2023-09-06T23:35:55Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph
Neural Networks for Traffic Forecasting [2.088376060651494]
We focus on a diffusion convolutional recurrent neural network (DCRNN), a state-of-the-art method for short-term traffic forecasting.
We develop a scalable deep ensemble approach to quantify uncertainties for DCRNN.
We show that our generic and scalable approach outperforms the current state-of-the-art Bayesian and a number of other commonly used frequentist techniques.
arXiv Detail & Related papers (2022-04-04T16:10:55Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.