Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
Object Detection
- URL: http://arxiv.org/abs/2210.16897v1
- Date: Sun, 30 Oct 2022 17:40:12 GMT
- Title: Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
Object Detection
- Authors: Shan Zhang and Naila Murray and Lei Wang and Piotr Koniusz
- Abstract summary: We propose a Time-rEversed diffusioN tEnsor Transformer (TENET) that captures multi-way feature occurrences that are highly discriminative.
We also propose a Transformer Relation Head (TRH) equipped with higher-order representations, which encodes correlations between query regions and the entire support set.
Our model achieves state-of-the-art results on PASCAL VOC, FSOD, and COCO.
- Score: 35.54153749138406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the challenging problem of Few-shot Object
Detection. Existing FSOD pipelines (i) use average-pooled representations that
result in information loss; and/or (ii) discard position information that can
help detect object instances. Consequently, such pipelines are sensitive to
large intra-class appearance and geometric variations between support and query
images. To address these drawbacks, we propose a Time-rEversed diffusioN tEnsor
Transformer (TENET), which i) forms high-order tensor representations that
capture multi-way feature occurrences that are highly discriminative, and ii)
uses a transformer that dynamically extracts correlations between the query
image and the entire support set, instead of a single average-pooled support
embedding. We also propose a Transformer Relation Head (TRH), equipped with
higher-order representations, which encodes correlations between query regions
and the entire support set, while being sensitive to the positional variability
of object instances. Our model achieves state-of-the-art results on PASCAL VOC,
FSOD, and COCO.
Related papers
- Investigating the Robustness and Properties of Detection Transformers
(DETR) Toward Difficult Images [1.5727605363545245]
Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks.
The critical issue to be addressed is how this model architecture can handle different image nuisances.
We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network.
arXiv Detail & Related papers (2023-10-12T23:38:52Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Transformation-Invariant Network for Few-Shot Object Detection in Remote
Sensing Images [15.251042369061024]
Few-shot object detection (FSOD) relies on a large amount of labeled data for training.
Scale and orientation variations of objects in remote sensing images pose significant challenges to existing FSOD methods.
We propose integrating a feature pyramid network and utilizing prototype features to enhance query features.
arXiv Detail & Related papers (2023-03-13T02:21:38Z) - Semantic-aligned Fusion Transformer for One-shot Object Detection [18.58772037047498]
One-shot object detection aims at detecting novel objects according to merely one given instance.
Current approaches explore various feature fusions to obtain directly transferable meta-knowledge.
We propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues.
arXiv Detail & Related papers (2022-03-17T05:38:47Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Guiding Query Position and Performing Similar Attention for
Transformer-Based Detection Heads [20.759022922347697]
We propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively.
Besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map.
Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer.
arXiv Detail & Related papers (2021-08-22T11:32:34Z) - Exploring Sequence Feature Alignment for Domain Adaptive Detection
Transformers [141.70707071815653]
We propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers.
SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module.
Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.
arXiv Detail & Related papers (2021-07-27T07:17:12Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.