Global Occlusion-Aware Transformer for Robust Stereo Matching
- URL: http://arxiv.org/abs/2312.14650v1
- Date: Fri, 22 Dec 2023 12:34:58 GMT
- Title: Global Occlusion-Aware Transformer for Robust Stereo Matching
- Authors: Zihua Liu, Yizhou Li and Masatoshi Okutomi
- Abstract summary: This paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT)
GOAT exploits long-range dependency and occlusion-awareness global context for disparity estimation.
The proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.
- Score: 11.655465312241699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable progress facilitated by learning-based stereo-matching
algorithms, the performance in the ill-conditioned regions, such as the
occluded regions, remains a bottleneck. Due to the limited receptive field,
existing CNN-based methods struggle to handle these ill-conditioned regions
effectively. To address this issue, this paper introduces a novel
attention-based stereo-matching network called Global Occlusion-Aware
Transformer (GOAT) to exploit long-range dependency and occlusion-awareness
global context for disparity estimation. In the GOAT architecture, a parallel
disparity and occlusion estimation module PDO is proposed to estimate the
initial disparity map and the occlusion mask using a parallel attention
mechanism. To further enhance the disparity estimates in the occluded regions,
an occlusion-aware global aggregation module (OGA) is proposed. This module
aims to refine the disparity in the occluded regions by leveraging restricted
global correlation within the focus scope of the occluded areas. Extensive
experiments were conducted on several public benchmark datasets including
SceneFlow, KITTI 2015, and Middlebury. The results show that the proposed GOAT
demonstrates outstanding performance among all benchmarks, particularly in the
occluded regions.
Related papers
- Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization [81.32266996009575]
In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima.
We propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side.
arXiv Detail & Related papers (2024-05-29T08:46:21Z) - CMU-Flownet: Exploring Point Cloud Scene Flow Estimation in Occluded Scenario [10.852258389804984]
Occlusions hinder point cloud frame alignment in LiDAR data, a challenge inadequately addressed by scene flow models.
We introduce the Correlation Matrix Upsampling Flownet (CMU-Flownet), incorporating an occlusion estimation module within its cost volume layer.
CMU-Flownet establishes state-of-the-art performance within the realms of occluded Flyingthings3D and KITTY datasets.
arXiv Detail & Related papers (2024-04-16T13:47:21Z) - Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning [50.88504784466931]
Multi-task dense prediction involves semantic segmentation, depth estimation, and surface normal estimation.
Existing solutions typically rely on learning global image representations for global cross-task image matching.
Our proposal involves modeling region-wise representations using Gaussian Distributions.
arXiv Detail & Related papers (2024-03-15T12:41:30Z) - Digging Into Normal Incorporated Stereo Matching [18.849192633442453]
We propose a normal incorporated joint learning framework consisting of two specific modules named non-local disparity propagation(NDP) and affinity-aware residual learning(ARL)
By the time we finished this work, our approach ranked 1st for stereo matching across foreground pixels on the KITTI 2015 dataset and 3rd on the Scene Flow dataset among all the published works.
arXiv Detail & Related papers (2024-02-28T09:01:50Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - Region Generation and Assessment Network for Occluded Person
Re-Identification [43.49129366128688]
Person Re-identification (ReID) plays a more and more crucial role in recent years with a wide range of applications.
Most methods tackle such challenges by utilizing external tools to locate body parts or exploiting matching strategies.
We propose a Region Generation and Assessment Network (RGANet) to effectively and efficiently detect the human body regions.
arXiv Detail & Related papers (2023-09-07T08:41:47Z) - Coupling Global Context and Local Contents for Weakly-Supervised
Semantic Segmentation [54.419401869108846]
We propose a single-stage WeaklySupervised Semantic (WSSS) model with only the image-level class label supervisions.
A flexible context aggregation module is proposed to capture the global object context in different granular spaces.
A semantically consistent feature fusion module is proposed in a bottom-up parameter-learnable fashion to aggregate the fine-grained local contents.
arXiv Detail & Related papers (2023-04-18T15:29:23Z) - Error-Aware Spatial Ensembles for Video Frame Interpolation [50.63021118973639]
Video frame(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations.
Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios.
This work introduces such a solution. By closely examining the correlation between optical flow and IE, the paper proposes novel error prediction metrics that partition the middle frame into distinct regions corresponding to different IE levels.
arXiv Detail & Related papers (2022-07-25T16:15:38Z) - Realtime Global Attention Network for Semantic Segmentation [4.061739586881057]
We propose an integrated global attention neural network (RGANet) for semantic segmentation.
The integration of these global attention modules into a hierarchy of transformations maintains an improved evaluation metric performance.
arXiv Detail & Related papers (2021-12-24T04:24:18Z) - Local-Global Associative Frame Assemble in Video Re-ID [57.7470971197962]
Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause challenges in learning discriminative representations in video re-identification (Re-ID)
Most existing methods tackle this problem by assessing the importance of video frames according to either their local part alignments or global appearance correlations separately.
In this work, we explore jointly both local alignments and global correlations with further consideration of their mutual promotion/reinforcement.
arXiv Detail & Related papers (2021-10-22T19:07:39Z) - Region attention and graph embedding network for occlusion objective
class-based micro-expression recognition [26.5638344747854]
Micro-expression recognition (textbfMER) has attracted lots of researchers' attention in a decade.
This paper deeply investigates an interesting but unexplored challenging issue in MER, ie, occlusion MER.
A underlineRegion-inspired underlineRelation underlineReasoning underlineNetwork (textbfRRRN) is proposed to model relations between various facial regions.
arXiv Detail & Related papers (2021-07-13T08:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.