Deep Laparoscopic Stereo Matching with Transformers
- URL: http://arxiv.org/abs/2207.12152v1
- Date: Mon, 25 Jul 2022 12:54:32 GMT
- Title: Deep Laparoscopic Stereo Matching with Transformers
- Authors: Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zhiyong
Wang, and Zongyuan Ge
- Abstract summary: Self-attention mechanism, successfully employed with the transformer structure, is shown promise in many computer vision tasks.
We propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design.
- Score: 46.18206008056612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The self-attention mechanism, successfully employed with the transformer
structure is shown promise in many computer vision tasks including image
recognition, and object detection. Despite the surge, the use of the
transformer for the problem of stereo matching remains relatively unexplored.
In this paper, we comprehensively investigate the use of the transformer for
the problem of stereo matching, especially for laparoscopic videos, and propose
a new hybrid deep stereo matching framework (HybridStereoNet) that combines the
best of the CNN and the transformer in a unified design. To be specific, we
investigate several ways to introduce transformers to volumetric stereo
matching pipelines by analyzing the loss landscape of the designs and
in-domain/cross-domain accuracy. Our analysis suggests that employing
transformers for feature representation learning, while using CNNs for cost
aggregation will lead to faster convergence, higher accuracy and better
generalization than other options. Our extensive experiments on Sceneflow,
SCARED2019 and dVPN datasets demonstrate the superior performance of our
HybridStereoNet.
Related papers
- CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation.
We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales.
We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z) - On the Surprising Effectiveness of Transformers in Low-Labeled Video
Recognition [18.557920268145818]
Video vision transformers have been shown to be competitive with convolution-based methods (CNNs) broadly across multiple vision tasks.
Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting.
We even show that using just the labeled data, transformers significantly outperform complex semi-supervised CNN methods that leverage large-scale unlabeled data as well.
arXiv Detail & Related papers (2022-09-15T17:12:30Z) - Deep Hyperspectral Unmixing using Transformer Network [7.3050653207383025]
We propose a novel deep unmixing model with transformers.
The proposed model is a combination of a convolutional autoencoder and a transformer.
The data are reconstructed using a convolutional decoder.
arXiv Detail & Related papers (2022-03-31T14:47:36Z) - Blending Anti-Aliasing into Vision Transformer [57.88274087198552]
discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps.
Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions.
We propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue.
arXiv Detail & Related papers (2021-10-28T14:30:02Z) - The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis.
Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Vision Transformers for Dense Prediction [77.34726150561087]
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Our experiments show that this architecture yields substantial improvements on dense prediction tasks.
arXiv Detail & Related papers (2021-03-24T18:01:17Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.