Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised
Object Discovery with Transformers
- URL: http://arxiv.org/abs/2210.13053v1
- Date: Mon, 24 Oct 2022 09:19:09 GMT
- Title: Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised
Object Discovery with Transformers
- Authors: Zhiwei Lin, Zengyu Yang and Yongtao Wang
- Abstract summary: We propose FOReground guidance and MUlti-LAyer feature fusion for unsupervised object discovery, dubbed FORMULA.
We present a foreground guidance strategy with an off-the-shelf UOD detector to highlight the foreground regions on the feature maps and then refine object locations in an iterative fashion.
To solve the scale variation issues in object detection, we design a multi-layer feature fusion module that aggregates features responding to objects at different scales.
- Score: 8.88037278008401
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Unsupervised object discovery (UOD) has recently shown encouraging progress
with the adoption of pre-trained Transformer features. However, current methods
based on Transformers mainly focus on designing the localization head (e.g.,
seed selection-expansion and normalized cut) and overlook the importance of
improving Transformer features. In this work, we handle UOD task from the
perspective of feature enhancement and propose FOReground guidance and
MUlti-LAyer feature fusion for unsupervised object discovery, dubbed FORMULA.
Firstly, we present a foreground guidance strategy with an off-the-shelf UOD
detector to highlight the foreground regions on the feature maps and then
refine object locations in an iterative fashion. Moreover, to solve the scale
variation issues in object detection, we design a multi-layer feature fusion
module that aggregates features responding to objects at different scales. The
experiments on VOC07, VOC12, and COCO 20k show that the proposed FORMULA
achieves new state-of-the-art results on unsupervised object discovery. The
code will be released at https://github.com/VDIGPKU/FORMULA.
Related papers
- PointOBB: Learning Oriented Object Detection via Single Point
Supervision [55.88982271340328]
This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection.
PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view.
Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance.
arXiv Detail & Related papers (2023-11-23T15:51:50Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised
Video Object Segmentation [62.98078087018469]
We introduce MSDeAOT, a variant of the AOT framework that incorporates transformers at multiple feature scales.
MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16.
We also employ GPM in a more refined feature scale with a stride of 8, leading to improved accuracy in detecting and tracking small objects.
arXiv Detail & Related papers (2023-07-05T03:43:15Z) - Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Learning Object-level Point Augmentor for Semi-supervised 3D Object
Detection [85.170578641966]
We propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.
In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds.
Experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-19T06:56:14Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - Task Specific Attention is one more thing you need for object detection [0.0]
We propose that combining several attention modules with our new Task Specific Split Transformer(TSST) is a fairly good enough method to produce the best COCO results.
In this paper, we propose that combining several attention modules with our new Task Specific Split Transformer(TSST) is a fairly good enough method to produce the best COCO results.
arXiv Detail & Related papers (2022-02-18T07:09:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.