Visual Tracking by TridentAlign and Context Embedding
- URL: http://arxiv.org/abs/2007.06887v1
- Date: Tue, 14 Jul 2020 08:00:26 GMT
- Title: Visual Tracking by TridentAlign and Context Embedding
- Authors: Janghoon Choi, Junseok Kwon, Kyoung Mu Lee
- Abstract summary: We propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods.
The performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.
- Score: 71.60159881028432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Siamese network-based visual tracking methods have enabled
high performance on numerous tracking benchmarks. However, extensive scale
variations of the target object and distractor objects with similar categories
have consistently posed challenges in visual tracking. To address these
persisting issues, we propose novel TridentAlign and context embedding modules
for Siamese network-based visual tracking methods. The TridentAlign module
facilitates adaptability to extensive scale variations and large deformations
of the target, where it pools the feature representation of the target object
into multiple spatial dimensions to form a feature pyramid, which is then
utilized in the region proposal stage. Meanwhile, context embedding module aims
to discriminate the target from distractor objects by accounting for the global
context information among objects. The context embedding module extracts and
embeds the global context information of a given frame into a local feature
representation such that the information can be utilized in the final
classification stage. Experimental results obtained on multiple benchmark
datasets show that the performance of the proposed tracker is comparable to
that of state-of-the-art trackers, while the proposed tracker runs at real-time
speed.
Related papers
- Multi-Object Tracking by Hierarchical Visual Representations [40.521291165765696]
We propose a new visual hierarchical representation paradigm for multi-object tracking.
It is more effective to discriminate between objects by attending to objects' compositional visual regions and contrasting with the background contextual information.
arXiv Detail & Related papers (2024-02-24T20:10:44Z) - CiteTracker: Correlating Image and Text for Visual Tracking [114.48653709286629]
We propose the CiteTracker to enhance target modeling and inference in visual tracking by connecting images and text.
Specifically, we develop a text generation module to convert the target image patch into a descriptive text.
We then associate the target description and the search image using an attention-based correlation module to generate the correlated features for target state reference.
arXiv Detail & Related papers (2023-08-22T09:53:12Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Tiny Object Tracking: A Large-scale Dataset and A Baseline [40.93697515531104]
We create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames.
In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities.
We propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework.
arXiv Detail & Related papers (2022-02-11T15:00:32Z) - e-TLD: Event-based Framework for Dynamic Object Tracking [23.026432675020683]
This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions.
The framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view.
arXiv Detail & Related papers (2020-09-02T07:08:56Z) - RPT: Learning Point Set Representation for Siamese Visual Tracking [15.04182251944942]
We propose an effcient visual tracking framework to accurately estimate the target state with a finer representation as a set of representative points.
Our method achieves new state-of-the-art performance while running at over 20 FPS.
arXiv Detail & Related papers (2020-08-08T07:42:58Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - Applying r-spatiogram in object tracking for occlusion handling [16.36552899280708]
The aim of video tracking is to accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence.
In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e. object modeling, object detection and localization, and model updating.
arXiv Detail & Related papers (2020-03-18T02:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.