CATs++: Boosting Cost Aggregation with Convolutions and Transformers
- URL: http://arxiv.org/abs/2202.06817v1
- Date: Mon, 14 Feb 2022 15:54:58 GMT
- Title: CATs++: Boosting Cost Aggregation with Convolutions and Transformers
- Authors: Seokju Cho, Sunghwan Hong, Seungryong Kim
- Abstract summary: We introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map.
Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer, we propose CATs++.
Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks.
- Score: 31.22435282922934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cost aggregation is a highly important process in image matching tasks, which
aims to disambiguate the noisy matching scores. Existing methods generally
tackle this by hand-crafted or CNN-based methods, which either lack robustness
to severe deformations or inherit the limitation of CNNs that fail to
discriminate incorrect matches due to limited receptive fields and
inadaptability. In this paper, we introduce Cost Aggregation with Transformers
(CATs) to tackle this by exploring global consensus among initial correlation
map with the help of some architectural designs that allow us to fully enjoy
global receptive fields of self-attention mechanism. Also, to alleviate some of
the limitations that CATs may face, i.e., high computational costs induced by
the use of a standard transformer that its complexity grows with the size of
spatial and feature dimensions, which restrict its applicability only at
limited resolution and result in rather limited performance, we propose CATs++,
an extension of CATs. Our proposed methods outperform the previous
state-of-the-art methods by large margins, setting a new state-of-the-art for
all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further
provide extensive ablation studies and analyses.
Related papers
- OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.
They often pose optimization challenges, with poor convergence.
We introduce an over- parameterized approach that accelerates training without increasing inference costs.
We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - On the Power of Convolution Augmented Transformer [30.46405043231576]
We study the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks.
Cat incorporates convolutional filters in the K/Q/V embeddings of an attention layer.
We show that the locality of the convolution synergizes with the global view of the attention.
arXiv Detail & Related papers (2024-07-08T04:08:35Z) - CAT: Contrastive Adapter Training for Personalized Image Generation [4.093428697109545]
We present Contrastive Adapter Training (CAT) to enhance adapter training through the application of CAT loss.
Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters.
arXiv Detail & Related papers (2024-04-11T08:36:13Z) - Quantified Task Misalignment to Inform PEFT: An Exploration of Domain
Generalization and Catastrophic Forgetting in CLIP [7.550566004119157]
We analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods.
A method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.
arXiv Detail & Related papers (2024-02-14T23:01:03Z) - PIPE : Parallelized Inference Through Post-Training Quantization
Ensembling of Residual Expansions [23.1120983784623]
PIPE is a quantization method that leverages residual error expansion, along with group sparsity and an ensemble approximation for better parallelization.
It achieves superior performance on every benchmarked application (from vision to NLP tasks), architecture (ConvNets, transformers) and bit-width.
arXiv Detail & Related papers (2023-11-27T13:29:34Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.