Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
- URL: http://arxiv.org/abs/2407.15282v1
- Date: Sun, 21 Jul 2024 22:08:52 GMT
- Title: Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
- Authors: Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao,
- Abstract summary: In this technical report, we detail our first-place solution for the 2024 Open dataset Challenge's semantic segmentation track.
We significantly enhanced the performance of Point Transformer V3 on the benchmark by implementing cutting-edge, plug-and-play training and inference technologies.
This approach secured us the top position on the Open dataset segmentation leaderboard, markedly outperforming other entries.
- Score: 98.11452697097539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we detail our first-place solution for the 2024 Waymo Open Dataset Challenge's semantic segmentation track. We significantly enhanced the performance of Point Transformer V3 on the Waymo benchmark by implementing cutting-edge, plug-and-play training and inference technologies. Notably, our advanced version, Point Transformer V3 Extreme, leverages multi-frame training and a no-clipping-point policy, achieving substantial gains over the original PTv3 performance. Additionally, employing a straightforward model ensemble strategy further boosted our results. This approach secured us the top position on the Waymo Open Dataset semantic segmentation leaderboard, markedly outperforming other entries.
Related papers
- First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation [1.8570591025615457]
We present the first place solution to the ECCV 2024 BRAVO Challenge.
A model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets.
This approach outperforms more complex existing approaches, and achieves first place in the challenge.
arXiv Detail & Related papers (2024-09-25T16:15:06Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation [9.322345758563886]
We present our solution for the semantic segmentation in adverse weather, in UG2+ Challenge at CVPR 2024.
We initialize the InternImage-H backbone with pre-trained weights from the large-scale joint dataset and enhance it with the state-of-the-art Upernet segmentation method.
Our proposed solution demonstrates advanced performance on the test set and achieves 3rd position in this challenge.
arXiv Detail & Related papers (2024-06-09T15:56:35Z) - FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels [57.05834683261658]
We present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation.
We develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
arXiv Detail & Related papers (2023-08-07T17:59:48Z) - 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
Segmentation on VSPW [68.56017675820897]
In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation.
arXiv Detail & Related papers (2023-06-04T07:50:38Z) - MTR-A: 1st Place Solution for 2022 Waymo Open Dataset Challenge --
Motion Prediction [103.75625476231401]
We propose a novel Motion Transformer framework for multimodal motion prediction, which introduces a small set of novel motion query pairs.
A simple model ensemble strategy with non-maximum-suppression is adopted to further boost the final performance.
Our approach achieves the 1st place on the motion prediction leaderboard of 2022 Open dataset Challenges, outperforming other methods with remarkable margins.
arXiv Detail & Related papers (2022-09-20T23:03:22Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Semantic Segmentation on VSPW Dataset through Aggregation of Transformer
Models [10.478712332545854]
This report introduces the solutions of team 'BetterThing' for the ICCV2021 - Video Scene Parsing in the Wild Challenge.
Transformer is used as the backbone for extracting video frame features, and the final result is the aggregation of the output of two Transformer models, SWIN and VOLO.
This solution achieves 57.3% mIoU, which is ranked 3rd place in the Video Scene Parsing in the Wild Challenge.
arXiv Detail & Related papers (2021-09-03T05:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.