Feature Pyramid Transformer
- URL: http://arxiv.org/abs/2007.09451v1
- Date: Sat, 18 Jul 2020 15:16:32 GMT
- Title: Feature Pyramid Transformer
- Authors: Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua and
Qianru Sun
- Abstract summary: We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
- Score: 121.50066435635118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature interactions across space and scales underpin modern visual
recognition systems because they introduce beneficial visual contexts.
Conventionally, spatial contexts are passively hidden in the CNN's increasing
receptive fields or actively encoded by non-local convolution. Yet, the
non-local spatial interactions are not across scales, and thus they fail to
capture the non-local contexts of objects (or parts) residing in different
scales. To this end, we propose a fully active feature interaction across both
space and scales, called Feature Pyramid Transformer (FPT). It transforms any
feature pyramid into another feature pyramid of the same size but with richer
contexts, by using three specially designed transformers in self-level,
top-down, and bottom-up interaction fashion. FPT serves as a generic visual
backbone with fair computational overhead. We conduct extensive experiments in
both instance-level (i.e., object detection and instance segmentation) and
pixel-level segmentation tasks, using various backbones and head networks, and
observe consistent improvement over all the baselines and the state-of-the-art
methods.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer [29.95553680263075]
We propose Feature Matching with Reconciliatory Transformer (FMRT), a detector-free method that reconciles different features with multiple receptive fields adaptively.
FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.
arXiv Detail & Related papers (2023-10-20T15:54:18Z) - Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation [22.587913528540465]
In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations.
Local feature learning is performed through Relative Position, Attentive Feature Pooling.
We demonstrate its superiority experimentally on classification and segmentation tasks.
arXiv Detail & Related papers (2023-04-27T12:17:35Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Centralized Feature Pyramid for Object Detection [53.501796194901964]
Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications.
In this paper, we propose a OLO Feature Pyramid for object detection, which is based on a globally explicit centralized feature regulation.
arXiv Detail & Related papers (2022-10-05T08:32:54Z) - ScaleFormer: Revisiting the Transformer-based Backbones from a
Scale-wise Perspective for Medical Image Segmentation [16.995195979992015]
We propose a new vision transformer-based backbone, called ScaleFormer, for medical image segmentation.
A scale-wise intra-scale transformer is designed to couple the CNN-based local features with the transformer-based global cues in each scale.
A simple and effective spatial-aware inter-scale transformer is designed to interact among consensual regions in multiple scales.
arXiv Detail & Related papers (2022-07-29T08:55:00Z) - RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
Retrieval [66.2075707179047]
We propose a novel mixture-of-expert transformer RoME that disentangles the text and the video into three levels.
We utilize a transformer-based attention mechanism to fully exploit visual and text embeddings at both global and local levels.
Our method outperforms the state-of-the-art methods on the YouCook2 and MSR-VTT datasets.
arXiv Detail & Related papers (2022-06-26T11:12:49Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.