Related papers: RCDT: Relational Remote Sensing Change Detection with Transformer

Related papers

Tri-path DINO: Feature Complementary Learning for Remote Sensing Multi-Class Change Detection [5.393722656625907]
In remote sensing imagery, multi class change detection (MCD) is crucial for fine grained monitoring.<n>We propose the Tripath DINO architecture, which adopts a three path complementary feature learning strategy.<n>A multi scale attention mechanism is introduced to augment the decoder network, where parallel convolutions adaptively capture and enhance contextual information.
arXiv Detail & Related papers (2026-03-02T06:10:24Z)
Towards Remote Sensing Change Detection with Neural Memory [61.39582645714727]
ChangeTitans is a Titans-based framework for remote sensing change detection.<n>First, we propose VTitans, which integrates neural memory with segmented local attention.<n>Second, we present a hierarchical VTitans-Adapter to refine multi-scale features across different network layers.<n>Third, we introduce TS-CBAM, a two-stream fusion module, to suppress pseudo-changes and enhance detection accuracy.
arXiv Detail & Related papers (2026-02-11T03:50:51Z)
TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder [66.22997415145467]
This paper presents a joint completion and detection framework that improves the detection feature in sparse areas.<n> Specifically, we propose TransBridge, a novel transformer-based up-sampling block that fuses the features from the detection and completion networks.<n>The results show that our framework consistently improves end-to-end 3D object detection, with the mean average precision (mAP) ranging from 0.7 to 1.5 across multiple methods.
arXiv Detail & Related papers (2025-12-12T00:08:03Z)
Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
MTNet: Learning modality-aware representation with transformer for RGBT tracking [35.96855931247585]
We propose a modality-aware tracker based on transformer, termed MTNet.<n>A transformer fusion network is then applied to capture global dependencies to reinforce instance representations.<n>The proposed method achieves satisfactory results compared with the state-of-the-art competitors on three RGBT benchmarks.
arXiv Detail & Related papers (2025-08-24T10:01:11Z)
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations [5.5967570276373655]
We systematically analyze the impact of ablating key components in three state-of-the-art detection transformer models.<n>We evaluate the effects of these ablations on the performance metrics gIoU and F1-score.<n>This study advances XAI for DETRs by clarifying the contributions of internal components to model performance.
arXiv Detail & Related papers (2025-07-29T12:00:08Z)
STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection [5.4610555622532475]
We present STeInFormer, a spatial-temporal interaction Transformer architecture for multi-temporal feature extraction. We also propose a parameter-free multi-frequency token mixer to integrate frequency-domain features that provide spectral information for RSCD.
arXiv Detail & Related papers (2024-12-23T03:40:04Z)
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features. RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z)
Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features. Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z)
Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers [0.0]
Traffic Sign Recognition (TSR) holds a vital role in advancing driver assistance systems and autonomous vehicles. This study explores three variants of Vision Transformers (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models. To address the shortcomings of traditional methods, a novel pyramid EATFormer backbone is proposed, amalgamating Evolutionary Algorithms (EAs) with the Transformer architecture.
arXiv Detail & Related papers (2024-04-29T19:18:52Z)
Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector. We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
X Modality Assisting RGBT Object Tracking [36.614908357546035]
We propose a novel X Modality Assisting Network (X-Net) to shed light on the impact of the fusion paradigm. To tackle the feature learning hurdles stemming from significant differences between RGB and thermal modalities, a plug-and-play pixel-level generation module (PGM) is proposed. We also propose a feature-level interaction module (FIM) that incorporates a mixed feature interaction transformer and a spatial-dimensional feature translation strategy.
arXiv Detail & Related papers (2023-12-27T05:38:54Z)
A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks. The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z)
Salient Object Detection in Optical Remote Sensing Images Driven by Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD) Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z)
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR. UniTR processes a variety of modalities with unified modeling and shared parameters. UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z)
Rethinking the Detection Head Configuration for Traffic Object Detection [11.526701794026641]
We propose a lightweight traffic object detection network based on matching between detection head and object distribution. The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
arXiv Detail & Related papers (2022-10-08T02:23:57Z)
RGB-D Salient Object Detection with Cross-Modality Modulation and Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD) The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.