Related papers: Cross-modulated Attention Transformer for RGBT Tracking

Cross-modulated Attention Transformer for RGBT Tracking

URL: http://arxiv.org/abs/2408.02222v1
Date: Mon, 5 Aug 2024 03:54:40 GMT
Title: Cross-modulated Attention Transformer for RGBT Tracking
Authors: Yun Xiao, Jiacong Zhao, Andong Lu, Chenglong Li, Yin Lin, Bing Yin, Cong Liu,
Abstract summary: We propose a novel approach called Cross-modulated Attention Transformer (CAFormer) for RGBT tracking. In particular, we first independently generate correlation maps for each modality and feed them into the designed Correlation Modulated Enhancement module. Experiments on five public RGBT tracking benchmarks show the outstanding performance of the proposed CAFormer against state-of-the-art methods.
Score: 35.1700920590541
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing Transformer-based RGBT trackers achieve remarkable performance benefits by leveraging self-attention to extract uni-modal features and cross-attention to enhance multi-modal feature interaction and template-search correlation computation. Nevertheless, the independent search-template correlation calculations ignore the consistency between branches, which can result in ambiguous and inappropriate correlation weights. It not only limits the intra-modal feature representation, but also harms the robustness of cross-attention for multi-modal feature interaction and search-template correlation computation. To address these issues, we propose a novel approach called Cross-modulated Attention Transformer (CAFormer), which performs intra-modality self-correlation, inter-modality feature interaction, and search-template correlation computation in a unified attention model, for RGBT tracking. In particular, we first independently generate correlation maps for each modality and feed them into the designed Correlation Modulated Enhancement module, modulating inaccurate correlation weights by seeking the consensus between modalities. Such kind of design unifies self-attention and cross-attention schemes, which not only alleviates inaccurate attention weight computation in self-attention but also eliminates redundant computation introduced by extra cross-attention scheme. In addition, we propose a collaborative token elimination strategy to further improve tracking inference efficiency and accuracy. Extensive experiments on five public RGBT tracking benchmarks show the outstanding performance of the proposed CAFormer against state-of-the-art methods.

Related papers

Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM [0.0]
We propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning. We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15% in F1 score across various appliance types.
arXiv Detail & Related papers (2024-10-12T18:58:45Z)
Spuriousness-Aware Meta-Learning for Learning Robust Classifiers [26.544938760265136]
Spurious correlations are brittle associations between certain attributes of inputs and target variables. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data.
arXiv Detail & Related papers (2024-06-15T21:41:25Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking [20.60198596317328]
Multimodal Entity Linking aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph. Existing methods attempt several local correlative mechanisms, relying heavily on the automatically learned attention weights. We propose a novel MEL framework, namely OT-MEL, with OT-guided correlation assignment.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z)
PerturbScore: Connecting Discrete and Continuous Perturbations in NLP [64.28423650146877]
In this paper, we aim to connect discrete perturbations with continuous perturbations. We design a regression task as a PerturbScore to learn the correlation automatically. We find that we can build a connection between discrete and continuous perturbations and use the proposed PerturbScore to learn such correlation.
arXiv Detail & Related papers (2023-10-13T06:50:15Z)
FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images. We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z)
Network inference via process motifs for lagged correlation in linear stochastic processes [0.5801044612920815]
A major challenge for causal inference from time-series data is the trade-off between computational feasibility and accuracy. We propose to infer networks of causal relations via pairwise edge measure (PEMs) that one can easily compute from lagged correlation matrices.
arXiv Detail & Related papers (2022-08-18T14:46:05Z)
High-Performance Transformer Tracking [74.07751002861802]
We present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT and TransT-M methods achieve promising results on seven popular datasets.
arXiv Detail & Related papers (2022-03-25T09:33:29Z)
A Tri-attention Fusion Guided Multi-modal Segmentation Network [2.867517731896504]
We propose a multi-modality segmentation network guided by a novel tri-attention fusion. Our network includes N model-independent encoding paths with N image sources, a tri-attention fusion block, a dual-attention fusion block, and a decoding path. Our experiment results tested on BraTS 2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-11-02T14:36:53Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.