CSHNet: A Novel Information Asymmetric Image Translation Method
- URL: http://arxiv.org/abs/2501.10197v1
- Date: Fri, 17 Jan 2025 13:44:54 GMT
- Title: CSHNet: A Novel Information Asymmetric Image Translation Method
- Authors: Xi Yang, Haoyuan Shi, Zihan Wang, Nannan Wang, Xinbo Gao,
- Abstract summary: We propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES)
CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets.
- Score: 57.22010952287759
- License:
- Abstract: Despite advancements in cross-domain image translation, challenges persist in asymmetric tasks such as SAR-to-Optical and Sketch-to-Instance conversions, which involve transforming data from a less detailed domain into one with richer content. Traditional CNN-based methods are effective at capturing fine details but struggle with global structure, leading to unwanted merging of image regions. To address this, we propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES), forming the SEC-CES-Bottleneck (SCB). SEC leverages CNN's detailed feature extraction while integrating the Swin Transformer's structural bias. CES, in turn, preserves the Swin Transformer's global integrity, compensating for CNN's lack of focus on structure. Additionally, CSHNet includes two components designed to enhance cross-domain information retention: the Interactive Guided Connection (IGC), which enables dynamic information exchange between SEC and CES, and Adaptive Edge Perception Loss (AEPL), which maintains structural boundaries during translation. Experimental results show that CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets. Our code is available at: https://github.com/XduShi/CSHNet.
Related papers
- A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images [4.14360329494344]
We propose a novel Shape Guided Transformer Network (SGTN) to accurately extract objects at the instance level.
Inspired by the global contextual modeling capacity of the self-attention mechanism, we propose an effective transformer encoder termed LSwin.
Our SGTN achieves the highest average precision (AP) scores on two single-class public datasets.
arXiv Detail & Related papers (2024-12-31T09:25:41Z) - Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation [13.753795233064695]
Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs)
We design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB)
ECB learns CNN on ViT to combine their distinct strengths.
arXiv Detail & Related papers (2024-03-27T08:52:44Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - SCTNet: Single-Branch CNN with Transformer Semantic Information for
Real-Time Segmentation [46.068509764538085]
SCTNet is a single branch CNN with transformer semantic information for real-time segmentation.
SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN.
We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance.
arXiv Detail & Related papers (2023-12-28T15:33:16Z) - SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for
Remote Sensing Images Change Detection [12.727650696327878]
We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN.
It captures the change relationship features through the densely connected Swin V2 backbone.
It provides the low-level pre-changed and post-changed features through a CNN branch.
arXiv Detail & Related papers (2023-08-22T03:31:52Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR
Image Road Extraction [11.308473487002782]
We propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks.
Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators.
arXiv Detail & Related papers (2022-01-10T06:05:12Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Dual-Level Collaborative Transformer for Image Captioning [126.59298716978577]
We introduce a novel Dual-Level Collaborative Transformer (DLCT) network to realize the complementary advantages of the two features.
In addition, we propose a Locality-Constrained Cross Attention module to address the semantic noises caused by the direct fusion of these two features.
arXiv Detail & Related papers (2021-01-16T15:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.