Related papers: CATs++: Boosting Cost Aggregation with Convolutions and Transformers

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

URL: http://arxiv.org/abs/2202.06817v1
Date: Mon, 14 Feb 2022 15:54:58 GMT
Title: CATs++: Boosting Cost Aggregation with Convolutions and Transformers
Authors: Seokju Cho, Sunghwan Hong, Seungryong Kim
Abstract summary: We introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer, we propose CATs++. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks.
Score: 31.22435282922934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its applicability only at limited resolution and result in rather limited performance, we propose CATs++, an extension of CATs. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further provide extensive ablation studies and analyses.

Related papers

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers [0.3626013617212666]
We introduce Circular-convolutional ATtention (CAT) to reduce complexity without sacrificing representational power. CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully-connected layers, and introduces no heavier operations. Grounded in an engineering-isomorphism framework, CAT's design offers practical efficiency and ease of implementation.
arXiv Detail & Related papers (2025-04-09T09:08:26Z)
OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters. They often pose optimization challenges, with poor convergence. We introduce an over- parameterized approach that accelerates training without increasing inference costs. We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
On the Power of Convolution Augmented Transformer [30.46405043231576]
We study the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. Cat incorporates convolutional filters in the K/Q/V embeddings of an attention layer. We show that the locality of the convolution synergizes with the global view of the attention.
arXiv Detail & Related papers (2024-07-08T04:08:35Z)
CAT: Contrastive Adapter Training for Personalized Image Generation [4.093428697109545]
We present Contrastive Adapter Training (CAT) to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters.
arXiv Detail & Related papers (2024-04-11T08:36:13Z)
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP [7.550566004119157]
We analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods. A method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.
arXiv Detail & Related papers (2024-02-14T23:01:03Z)
ALF: Adaptive Label Finetuning for Scene Graph Generation [116.59868289196157]
Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image. Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions. ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
arXiv Detail & Related papers (2023-12-29T01:37:27Z)
PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions [23.1120983784623]
PIPE is a quantization method that leverages residual error expansion, along with group sparsity and an ensemble approximation for better parallelization. It achieves superior performance on every benchmarked application (from vision to NLP tasks), architecture (ConvNets, transformers) and bit-width.
arXiv Detail & Related papers (2023-11-27T13:29:34Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks. Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo. Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.