Transformer Assisted Convolutional Network for Cell Instance
Segmentation
- URL: http://arxiv.org/abs/2110.02270v1
- Date: Tue, 5 Oct 2021 18:18:31 GMT
- Title: Transformer Assisted Convolutional Network for Cell Instance
Segmentation
- Authors: Deepanshu Pandey, Pradyumna Gupta, Sumit Bhattacharya, Aman Sinha,
Rohit Agarwal
- Abstract summary: We present a transformer based approach to enhance the performance of the conventional convolutional feature extractor.
Our approach merges the convolutional feature maps with transformer-based token embeddings by applying a projection operation similar to self-attention in transformers.
- Score: 5.195101477698897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Region proposal based methods like R-CNN and Faster R-CNN models have proven
to be extremely successful in object detection and segmentation tasks.
Recently, Transformers have also gained popularity in the domain of Computer
Vision, and are being utilised to improve the performance of conventional
models. In this paper, we present a relatively new transformer based approach
to enhance the performance of the conventional convolutional feature extractor
in the existing region proposal based methods. Our approach merges the
convolutional feature maps with transformer-based token embeddings by applying
a projection operation similar to self-attention in transformers. The results
of our experiments show that transformer assisted feature extractor achieves a
significant improvement in mIoU (mean Intersection over Union) scores compared
to vanilla convolutional backbone.
Related papers
- Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - SSformer: A Lightweight Transformer for Semantic Segmentation [7.787950060560868]
Swin Transformer set a new record in various vision tasks by using hierarchical architecture and shifted windows.
We design a lightweight yet effective transformer model, called SSformer.
Experimental results show the proposed SSformer yields comparable mIoU performance with state-of-the-art models.
arXiv Detail & Related papers (2022-08-03T12:57:00Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.