Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video
Grounding
- URL: http://arxiv.org/abs/2008.06941v2
- Date: Sat, 22 Aug 2020 11:11:32 GMT
- Title: Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video
Grounding
- Authors: Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai and Nicholas Jing Yuan
- Abstract summary: We propose a novel object-aware multi-branch relation network for object-aware relation discovery.
We then propose multi-branch reasoning to capture critical object relationships between the main branch and auxiliary branches.
- Score: 90.12181414070496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of
a queried object according to the given sentence. Currently, most existing
grounding methods are restricted to well-aligned segment-sentence pairs. In
this paper, we explore spatio-temporal video grounding on unaligned data and
multi-form sentences. This challenging task requires to capture critical object
relations to identify the queried target. However, existing approaches cannot
distinguish notable objects and remain in ineffective relation modeling between
unnecessary objects. Thus, we propose a novel object-aware multi-branch
relation network for object-aware relation discovery. Concretely, we first
devise multiple branches to develop object-aware region modeling, where each
branch focuses on a crucial object mentioned in the sentence. We then propose
multi-branch relation reasoning to capture critical object relationships
between the main branch and auxiliary branches. Moreover, we apply a diversity
loss to make each branch only pay attention to its corresponding object and
boost multi-branch learning. The extensive experiments show the effectiveness
of our proposed method.
Related papers
- Mutually-Aware Feature Learning for Few-Shot Object Counting [20.623402944601775]
Few-shot object counting has garnered significant attention for its practicality as it aims to count target objects in a query image based on given exemplars without the need for additional training.
We propose a novel framework, Mutually-Aware FEAture learning(MAFEA), which encodes query and exemplar features mutually aware of each other from the outset.
Our model reaches a new state-of-the-art performance on the two challenging benchmarks, FSCD-LVIS and FSC-147, with a remarkably reduced degree of the target confusion problem.
arXiv Detail & Related papers (2024-08-19T06:46:24Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Tackling Background Distraction in Video Object Segmentation [7.187425003801958]
A video object segmentation (VOS) aims to densely track certain objects in videos.
One of the main challenges in this task is the existence of background distractors that appear similar to the target objects.
We propose three novel strategies to suppress such distractors.
Our model achieves a comparable performance to contemporary state-of-the-art approaches, even with real-time performance.
arXiv Detail & Related papers (2022-07-14T14:25:19Z) - Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey [71.10448142010422]
Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories.
Embedding methods play an essential role in object location estimation and temporal identity association in MOT.
We first conduct a comprehensive overview with in-depth analysis for embedding methods in MOT from seven different perspectives.
arXiv Detail & Related papers (2022-05-22T06:54:33Z) - Suspected Object Matters: Rethinking Model's Prediction for One-stage
Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones.
SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders.
Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Joint Inductive and Transductive Learning for Video Object Segmentation [107.32760625159301]
Semi-supervised object segmentation is a task of segmenting the target object in a video sequence given only a mask in the first frame.
Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning.
We propose to integrate transductive and inductive learning into a unified framework to exploit complement between them for accurate and robust video object segmentation.
arXiv Detail & Related papers (2021-08-08T16:25:48Z) - Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network.
Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance.
Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.