PSFormer: Point Transformer for 3D Salient Object Detection
- URL: http://arxiv.org/abs/2210.15933v1
- Date: Fri, 28 Oct 2022 06:34:28 GMT
- Title: PSFormer: Point Transformer for 3D Salient Object Detection
- Authors: Baian Chen, Lipeng Gu, Xin Zhuang, Yiyang Shen, Weiming Wang,
Mingqiang Wei
- Abstract summary: PSFormer is an encoder-decoder network that takes full advantage of transformers to model contextual information.
In the encoder, we develop a Point Context Transformer (PCT) module to capture region contextual features at the point level.
In the decoder, we develop a Scene Context Transformer (SCT) module to learn context representations at the scene level.
- Score: 8.621996554264275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose PSFormer, an effective point transformer model for 3D salient
object detection. PSFormer is an encoder-decoder network that takes full
advantage of transformers to model the contextual information in both
multi-scale point- and scene-wise manners. In the encoder, we develop a Point
Context Transformer (PCT) module to capture region contextual features at the
point level; PCT contains two different transformers to excavate the
relationship among points. In the decoder, we develop a Scene Context
Transformer (SCT) module to learn context representations at the scene level;
SCT contains both Upsampling-and-Transformer blocks and Multi-context
Aggregation units to integrate the global semantic and multi-level features
from the encoder into the global scene context. Experiments show clear
improvements of PSFormer over its competitors and validate that PSFormer is
more robust to challenging cases such as small objects, multiple objects, and
objects with complex structures.
Related papers
- Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with
Progressive Video Transformers [71.72888202522644]
We propose a new end-to-end multi-person 3D and Shape estimation framework with progressive Video Transformer.
In PSVT, a-temporal encoder (PGA) captures the global feature dependencies among spatial objects.
To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used.
arXiv Detail & Related papers (2023-03-16T09:55:43Z) - Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT.
PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art.
We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z) - EDTER: Edge Detection with Transformer [71.83960813880843]
We propose a novel transformer-based edge detector, emphEdge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges.
EDTER exploits the full image context information and detailed local cues simultaneously.
Experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
arXiv Detail & Related papers (2022-03-16T11:55:55Z) - 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based
Instance Representation Learning [0.0]
6D-ViT is a transformer-based instance representation learning network.
It is suitable for highly accurate category-level object pose estimation on RGB-D images.
arXiv Detail & Related papers (2021-10-10T13:34:16Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Fully Transformer Networks for Semantic ImageSegmentation [26.037770622551882]
We explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN)
We propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT)
Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation.
arXiv Detail & Related papers (2021-06-08T05:15:28Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z) - 3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level.
A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z) - Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.