Staged Depthwise Correlation and Feature Fusion for Siamese Object
Tracking
- URL: http://arxiv.org/abs/2310.09747v1
- Date: Sun, 15 Oct 2023 06:04:42 GMT
- Title: Staged Depthwise Correlation and Feature Fusion for Siamese Object
Tracking
- Authors: Dianbo Ma, Jianqiang Xiao, Ziyan Gao, Satoshi Yamane
- Abstract summary: We propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking.
We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets.
For comprehensive evaluations of performance, we implement our tracker on the popular benchmarks, including OTB100, VOT2018 and LaSOT.
- Score: 0.6827423171182154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a novel staged depthwise correlation and feature
fusion network, named DCFFNet, to further optimize the feature extraction for
visual tracking. We build our deep tracker upon a siamese network architecture,
which is offline trained from scratch on multiple large-scale datasets in an
end-to-end manner. The model contains a core component, that is, depthwise
correlation and feature fusion module (correlation-fusion module), which
facilitates model to learn a set of optimal weights for a specific object by
utilizing ensembles of multi-level features from lower and higher layers and
multi-channel semantics on the same layer. We combine the modified ResNet-50
with the proposed correlation-fusion layer to constitute the feature extractor
of our model. In training process, we find the training of model become more
stable, that benifits from the correlation-fusion module. For comprehensive
evaluations of performance, we implement our tracker on the popular benchmarks,
including OTB100, VOT2018 and LaSOT. Extensive experiment results demonstrate
that our proposed method achieves favorably competitive performance against
many leading trackers in terms of accuracy and precision, while satisfying the
real-time requirements of applications.
Related papers
- Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation Network [18.47001817385548]
We propose a parallel inference network customized for semantic segmentation tasks.
We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy.
Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets.
arXiv Detail & Related papers (2024-02-03T22:51:17Z) - SENetV2: Aggregated dense layer for channelwise and global
representations [0.0]
We introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze residual module.
This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge.
We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures.
arXiv Detail & Related papers (2023-11-17T14:10:57Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - R-Cut: Enhancing Explainability in Vision Transformers with Relationship
Weighted Out and Cut [14.382326829600283]
We introduce two modules: the Relationship Weighted Out" and the Cut" modules.
The Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color.
We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset.
arXiv Detail & Related papers (2023-07-18T08:03:51Z) - OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds [6.661881950861012]
We propose a novel one-stream network with the strength of the instance-level encoding, which avoids the correlation operations occurring in previous Siamese network.
The proposed method has achieved considerable performance not only for class-specific tracking but also for class-agnostic tracking with less computation and higher efficiency.
arXiv Detail & Related papers (2022-10-16T12:31:59Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation.
We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them.
A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.