Siamese Tracking with Lingual Object Constraints
- URL: http://arxiv.org/abs/2011.11721v1
- Date: Mon, 23 Nov 2020 20:55:08 GMT
- Title: Siamese Tracking with Lingual Object Constraints
- Authors: Maximilian Filtenborg, Efstratios Gavves, Deepak Gupta
- Abstract summary: This paper explores, tracking visual objects subjected to additional lingual constraints.
Differently from Li et al., we impose additional lingual constraints upon tracking, which enables new applications of tracking.
Our method enables the selective compression of videos, based on the validity of the constraint.
- Score: 28.04334832366449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classically, visual object tracking involves following a target object
throughout a given video, and it provides us the motion trajectory of the
object. However, for many practical applications, this output is often
insufficient since additional semantic information is required to act on the
video material. Example applications of this are surveillance and
target-specific video summarization, where the target needs to be monitored
with respect to certain predefined constraints, e.g., 'when standing near a
yellow car'. This paper explores, tracking visual objects subjected to
additional lingual constraints. Differently from Li et al., we impose
additional lingual constraints upon tracking, which enables new applications of
tracking. Whereas in their work the goal is to improve and extend upon tracking
itself. To perform benchmarks and experiments, we contribute two datasets:
c-MOT16 and c-LaSOT, curated through appending additional constraints to the
frames of the original LaSOT and MOT16 datasets. We also experiment with two
deep models SiamCT-DFG and SiamCT-CA, obtained through extending a recent
state-of-the-art Siamese tracking method and adding modules inspired from the
fields of natural language processing and visual question answering. Through
experimental results, we show that the proposed model SiamCT-CA can
significantly outperform its counterparts. Furthermore, our method enables the
selective compression of videos, based on the validity of the constraint.
Related papers
- Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
Vision-Language Models (VLMs) have shown remarkable capabilities across diverse visual tasks.
Current VLMs lack a fundamental cognitive ability: learning to localize objects in a scene by taking into account the context.
This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z) - VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models [28.304047711166056]
Large-scale pre-trained models have shown promising advances in detecting and segmenting objects in 2D static images in the wild.
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos.
arXiv Detail & Related papers (2023-10-10T20:25:30Z) - Look, Remember and Reason: Grounded reasoning in videos with language
models [5.3445140425713245]
Multi-temporal language models (LM) have recently shown promising performance in high-level reasoning tasks on videos.
We propose training an LM end-to-end on low-level surrogate tasks, including object detection, re-identification, tracking, to endow the model with the required low-level visual capabilities.
We demonstrate the effectiveness of our framework on diverse visual reasoning tasks from the ACRE, CATER, Something-Else and STAR datasets.
arXiv Detail & Related papers (2023-06-30T16:31:14Z) - Dense Video Object Captioning from Disjoint Supervision [77.47084982558101]
We propose a new task and model for dense video object captioning.
This task unifies spatial and temporal localization in video.
We show how our model improves upon a number of strong baselines for this new task.
arXiv Detail & Related papers (2023-06-20T17:57:23Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.