CATs++: Boosting Cost Aggregation with Convolutions and Transformers
- URL: http://arxiv.org/abs/2202.06817v1
- Date: Mon, 14 Feb 2022 15:54:58 GMT
- Title: CATs++: Boosting Cost Aggregation with Convolutions and Transformers
- Authors: Seokju Cho, Sunghwan Hong, Seungryong Kim
- Abstract summary: We introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map.
Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer, we propose CATs++.
Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks.
- Score: 31.22435282922934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cost aggregation is a highly important process in image matching tasks, which
aims to disambiguate the noisy matching scores. Existing methods generally
tackle this by hand-crafted or CNN-based methods, which either lack robustness
to severe deformations or inherit the limitation of CNNs that fail to
discriminate incorrect matches due to limited receptive fields and
inadaptability. In this paper, we introduce Cost Aggregation with Transformers
(CATs) to tackle this by exploring global consensus among initial correlation
map with the help of some architectural designs that allow us to fully enjoy
global receptive fields of self-attention mechanism. Also, to alleviate some of
the limitations that CATs may face, i.e., high computational costs induced by
the use of a standard transformer that its complexity grows with the size of
spatial and feature dimensions, which restrict its applicability only at
limited resolution and result in rather limited performance, we propose CATs++,
an extension of CATs. Our proposed methods outperform the previous
state-of-the-art methods by large margins, setting a new state-of-the-art for
all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further
provide extensive ablation studies and analyses.
Related papers
- SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - On the Power of Convolution Augmented Transformer [30.46405043231576]
We study the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks.
Cat incorporates convolutional filters in the K/Q/V embeddings of an attention layer.
We show that the locality of the convolution synergizes with the global view of the attention.
arXiv Detail & Related papers (2024-07-08T04:08:35Z) - CAT: Contrastive Adapter Training for Personalized Image Generation [4.093428697109545]
We present Contrastive Adapter Training (CAT) to enhance adapter training through the application of CAT loss.
Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters.
arXiv Detail & Related papers (2024-04-11T08:36:13Z) - Quantified Task Misalignment to Inform PEFT: An Exploration of Domain
Generalization and Catastrophic Forgetting in CLIP [7.550566004119157]
We analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods.
A method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.
arXiv Detail & Related papers (2024-02-14T23:01:03Z) - ALF: Adaptive Label Finetuning for Scene Graph Generation [116.59868289196157]
Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image.
Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG.
We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions.
ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
arXiv Detail & Related papers (2023-12-29T01:37:27Z) - PIPE : Parallelized Inference Through Post-Training Quantization
Ensembling of Residual Expansions [23.1120983784623]
PIPE is a quantization method that leverages residual error expansion, along with group sparsity and an ensemble approximation for better parallelization.
It achieves superior performance on every benchmarked application (from vision to NLP tasks), architecture (ConvNets, transformers) and bit-width.
arXiv Detail & Related papers (2023-11-27T13:29:34Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.