Multiview Detection with Shadow Transformer (and View-Coherent Data
Augmentation)
- URL: http://arxiv.org/abs/2108.05888v1
- Date: Thu, 12 Aug 2021 17:59:02 GMT
- Title: Multiview Detection with Shadow Transformer (and View-Coherent Data
Augmentation)
- Authors: Yunzhong Hou and Liang Zheng
- Abstract summary: We propose a novel multiview detector, MVDeTr, that adopts a shadow transformer to aggregate multiview information.
Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions.
We report new state-of-the-art accuracy with the proposed system.
- Score: 25.598840284457548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiview detection incorporates multiple camera views to deal with
occlusions, and its central problem is multiview aggregation. Given feature map
projections from multiple views onto a common ground plane, the
state-of-the-art method addresses this problem via convolution, which applies
the same calculation regardless of object locations. However, such
translation-invariant behaviors might not be the best choice, as object
features undergo various projection distortions according to their positions
and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that
adopts a newly introduced shadow transformer to aggregate multiview
information. Unlike convolutions, shadow transformer attends differently at
different positions and cameras to deal with various shadow-like distortions.
We propose an effective training scheme that includes a new view-coherent data
augmentation method, which applies random augmentations while maintaining
multiview consistency. On two multiview detection benchmarks, we report new
state-of-the-art accuracy with the proposed system. Code is available at
https://github.com/hou-yz/MVDeTr.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - DVANet: Disentangling View and Action Features for Multi-View Action
Recognition [56.283944756315066]
We present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.
Our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets.
arXiv Detail & Related papers (2023-12-10T01:19:48Z) - DealMVC: Dual Contrastive Calibration for Multi-view Clustering [78.54355167448614]
We propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC)
We first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph.
During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels.
arXiv Detail & Related papers (2023-08-17T14:14:28Z) - Long-Range Grouping Transformer for Multi-View 3D Reconstruction [9.2709012704338]
Long-range grouping attention (LGA) based on the divide-and-conquer principle is proposed.
An effective and efficient encoder can be established which connects inter-view features.
A novel progressive upsampling decoder is also designed for voxel generation with relatively high resolution.
arXiv Detail & Related papers (2023-08-17T01:34:59Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Two-level Data Augmentation for Calibrated Multi-view Detection [51.5746691103591]
We introduce a new multi-view data augmentation pipeline that preserves alignment among views.
We also propose a second level of augmentation applied directly at the scene level.
When combined with our simple multi-view detection model, our two-level augmentation pipeline outperforms all existing baselines.
arXiv Detail & Related papers (2022-10-19T17:55:13Z) - Voxelized 3D Feature Aggregation for Multiview Detection [15.465855460519446]
We propose VFA, voxelized 3D feature aggregation, for feature transformation and aggregation in multi-view detection.
Specifically, we voxelize the 3D space, project the voxels onto each camera view, and associate 2D features with these projected voxels.
This allows us to identify and then aggregate 2D features along the same vertical line, alleviating projection distortions to a large extent.
arXiv Detail & Related papers (2021-12-07T03:38:50Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.