DuAT: Dual-Aggregation Transformer Network for Medical Image
Segmentation
- URL: http://arxiv.org/abs/2212.11677v1
- Date: Wed, 21 Dec 2022 07:54:02 GMT
- Title: DuAT: Dual-Aggregation Transformer Network for Medical Image
Segmentation
- Authors: Feilong Tang, Qiming Huang, Jinfeng Wang, Xianxu Hou, Jionglong Su,
Jingxin Liu
- Abstract summary: Transformer-based models have been widely demonstrated to be successful in computer vision tasks.
However, they are often dominated by features of large patterns leading to the loss of local details.
We propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs.
Our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images.
- Score: 21.717520350930705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based models have been widely demonstrated to be successful in
computer vision tasks by modelling long-range dependencies and capturing global
representations. However, they are often dominated by features of large
patterns leading to the loss of local details (e.g., boundaries and small
objects), which are critical in medical image segmentation. To alleviate this
problem, we propose a Dual-Aggregation Transformer Network called DuAT, which
is characterized by two innovative designs, namely, the Global-to-Local Spatial
Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA
has the ability to aggregate and represent both global and local spatial
features, which are beneficial for locating large and small objects,
respectively. The SBA module is used to aggregate the boundary characteristic
from low-level features and semantic information from high-level features for
better preserving boundary details and locating the re-calibration objects.
Extensive experiments in six benchmark datasets demonstrate that our proposed
model outperforms state-of-the-art methods in the segmentation of skin lesion
images, and polyps in colonoscopy images. In addition, our approach is more
robust than existing methods in various challenging situations such as small
object segmentation and ambiguous object boundaries.
Related papers
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model [71.50973774576431]
We propose a novel MLLM, INF-LLaVA, designed for effective high-resolution image perception.
We introduce a Dual-perspective Cropping Module (DCM), which ensures that each sub-image contains continuous details from a local perspective.
Second, we introduce Dual-perspective Enhancement Module (DEM) to enable the mutual enhancement of global and local features.
arXiv Detail & Related papers (2024-07-23T06:02:30Z) - Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring [0.0]
Image deblurring aims to restore a high-quality image from its corresponding blurred.
We propose an efficient image deblurring network that leverages selective state spaces model to aggregate enriched and accurate features.
Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on widely used benchmarks.
arXiv Detail & Related papers (2024-03-29T10:40:41Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object
and Boundary Constraints [9.238103649037951]
We present a framework aimed at leveraging the raw output of SAM by exploiting two novel concepts called SAM-Generated Object (SGO) and SAM-Generated Boundary (SGB)
Taking into account the content characteristics of SGO, we introduce the concept of object consistency to leverage segmented regions lacking semantic information.
The boundary loss capitalizes on the distinctive features of SGB by directing the model's attention to the boundary information of the object.
arXiv Detail & Related papers (2023-12-05T03:33:47Z) - SA2-Net: Scale-aware Attention Network for Microscopic Image
Segmentation [36.286876343282565]
Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image.
We introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to handle diverse structures within microscopic images.
arXiv Detail & Related papers (2023-09-28T17:58:05Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.