GraftNet: Towards Domain Generalized Stereo Matching with a
Broad-Spectrum and Task-Oriented Feature
- URL: http://arxiv.org/abs/2204.00179v1
- Date: Fri, 1 Apr 2022 03:10:04 GMT
- Title: GraftNet: Towards Domain Generalized Stereo Matching with a
Broad-Spectrum and Task-Oriented Feature
- Authors: Biyang Liu, Huimin Yu, Guodong Qi
- Abstract summary: We propose to leverage the feature of a model trained on large-scale datasets to deal with the domain shift.
With the cosine similarity based cost volume as a bridge, the feature will be grafted to an ordinary cost aggregation module.
Experiments show that the model generalization ability can be improved significantly with this broad-spectrum and task-oriented feature.
- Score: 2.610470075814367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although supervised deep stereo matching networks have made impressive
achievements, the poor generalization ability caused by the domain gap prevents
them from being applied to real-life scenarios. In this paper, we propose to
leverage the feature of a model trained on large-scale datasets to deal with
the domain shift since it has seen various styles of images. With the cosine
similarity based cost volume as a bridge, the feature will be grafted to an
ordinary cost aggregation module. Despite the broad-spectrum representation,
such a low-level feature contains much general information which is not aimed
at stereo matching. To recover more task-specific information, the grafted
feature is further input into a shallow network to be transformed before
calculating the cost. Extensive experiments show that the model generalization
ability can be improved significantly with this broad-spectrum and
task-oriented feature. Specifically, based on two well-known architectures
PSMNet and GANet, our methods are superior to other robust algorithms when
transferring from SceneFlow to KITTI 2015, KITTI 2012, and Middlebury. Code is
available at https://github.com/SpadeLiu/Graft-PSMNet.
Related papers
- Full-scale Representation Guided Network for Retinal Vessel Segmentation [1.3024517678456733]
U-Net has remained state-of-the-art (SOTA) for retinal vessel segmentation over the past decade.
We introduce a Full Scale Guided Network (FSG-Net), where the feature representation network with modernized convolution blocks extracts full-scale information.
We show that the proposed network demonstrates competitive results compared to current SOTA models on various public datasets.
arXiv Detail & Related papers (2025-01-31T06:52:57Z) - TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Multi-View Stereo Network with attention thin volume [0.0]
We propose an efficient multi-view stereo (MVS) network for infering depth value from multiple RGB images.
We introduce the self-attention mechanism to fully aggregate the dominant information from input images.
We also introduce the group-wise correlation to feature aggregation, which greatly reduces the memory and calculation burden.
arXiv Detail & Related papers (2021-10-16T11:51:23Z) - MixStyle Neural Networks for Domain Generalization and Adaptation [122.36901703868321]
MixStyle is a plug-and-play module that can improve domain generalization performance without the need to collect more data or increase model capacity.
Our experiments show that MixStyle can significantly boost out-of-distribution generalization performance across a wide range of tasks including image recognition, instance retrieval and reinforcement learning.
arXiv Detail & Related papers (2021-07-05T14:29:19Z) - Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for
Scene Segmentation [1.713291434132985]
We propose a novel multi-scale attention network for scene segmentation by using contextual information from an image.
This network can map local features with their global counterparts with improved accuracy and emphasize on discriminative image regions.
We have evaluated our model on two standard datasets named PascalVOC2012 and ADE20k.
arXiv Detail & Related papers (2020-09-15T08:03:41Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.