Related papers: Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models

Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models

URL: http://arxiv.org/abs/2403.15943v2
Date: Thu, 13 Jun 2024 15:30:02 GMT
Title: Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models
Authors: Zhenglin Li, Yangchen Huang, Mengran Zhu, Jingyu Zhang, JingHao Chang, Houze Liu,
Abstract summary: Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities. In this study, we harness the power of a pre-trained LLM, extracting feature maps from extensive datasets, and employ an auxiliary network to detect changes.
Score: 2.2933109484655794
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained LLM, extracting feature maps from extensive datasets, and employ an auxiliary network to detect changes. Unlike existing LLM-based change detection methods that solely focus on deriving high-quality feature maps, our approach emphasizes the manipulation of these feature maps to enhance semantic relevance.

Related papers

LDGNet: A Lightweight Difference Guiding Network for Remote Sensing Change Detection [6.554696547472252]
We propose a Lightweight Difference Guiding Network (LDGNet) to guide optical remote sensing change detection. First, to enhance the feature representation capability of the lightweight backbone network, we propose the Difference Guiding Module (DGM) Second, we propose the Difference-Aware Dynamic Fusion (DADF) module with Visual State Space Model (VSSM) for lightweight long-range dependency modeling.
arXiv Detail & Related papers (2025-04-07T13:33:54Z)
Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning [15.88864190284027]
This paper proposes a novel approach for remote sensing image change detection and description that incorporates diffusion models. We introduce a frequency-guided complex filter module to boost the model performance by managing high-frequency noise. We validate the effectiveness of our proposed method across several datasets for remote sensing change detection and description.
arXiv Detail & Related papers (2024-12-26T11:35:57Z)
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories. Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance. We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z)
Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning [49.24306593078429]
We propose a novel framework for remote sensing image change captioning, guided by Key Change Features and Instruction-tuned (KCFI) KCFI includes a ViTs encoder for extracting bi-temporal remote sensing image features, a key feature perceiver for identifying critical change areas, and a pixel-level change detection decoder. To validate the effectiveness of our approach, we compare it against several state-of-the-art change captioning methods on the LEVIR-CC dataset.
arXiv Detail & Related papers (2024-09-19T09:33:33Z)
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization. The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z)
ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection [16.62779899494721]
Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. We propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images.
arXiv Detail & Related papers (2024-04-26T17:47:14Z)
Cross-domain Multi-modal Few-shot Object Detection via Rich Text [21.36633828492347]
Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks. We study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method.
arXiv Detail & Related papers (2024-03-24T15:10:22Z)
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs [66.05826802808177]
In computer vision, large language models (LLMs) can be used to prime vision-language tasks such as image captioning and visual question answering. We present an experimental evaluation of different interfacing mechanisms, across multiple tasks. We identify a new interfacing mechanism that yields (near) optimal results across different tasks, while obtaining a 4x reduction in training time.
arXiv Detail & Related papers (2024-03-20T10:57:17Z)
Selective Domain-Invariant Feature for Generalizable Deepfake Detection [21.671221284842847]
We propose a novel framework which reduces the sensitivity to face forgery by fusing content features and styles. Both qualitative and quantitative results in existing benchmarks and proposals demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-19T13:09:19Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Supervising Remote Sensing Change Detection Models with 3D Surface Semantics [1.8782750537161614]
We propose Contrastive Surface-Image Pretraining (CSIP) for joint learning using optical RGB and above ground level (AGL) map pairs. We then evaluate these pretrained models on several building segmentation and change detection datasets to show that our method does, in fact, extract features relevant to downstream applications.
arXiv Detail & Related papers (2022-02-26T23:35:43Z)
Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs) Our approach is based on learning a set of global and task-specific parameters. We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z)
Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries. This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management. We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.