Related papers: Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Which Tokens to Use? Investigating Token Reduction in Vision Transformers

URL: http://arxiv.org/abs/2308.04657v1
Date: Wed, 9 Aug 2023 01:51:07 GMT
Title: Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund
Abstract summary: We study the reduction patterns of 10 different token reduction methods using four image classification datasets. We find that the Top-K pruning method is a surprisingly strong baseline. The similarity of reduction patterns is a moderate-to-strong proxy for model performance.
Score: 64.99704164972513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.

Related papers

Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model [2.580765958706854]
Diffusion models are hindered by their high computational cost and slow inference. One such approach focuses on reducing the number of tokens fed into the self-attention, known as token merging (ToMe)
arXiv Detail & Related papers (2025-01-01T20:16:27Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Learning to Rank Patches for Unbiased Image Redundancy Reduction [80.93989115541966]
Images suffer from heavy spatial redundancy because pixels in neighboring regions are spatially correlated. Existing approaches strive to overcome this limitation by reducing less meaningful image regions. We propose a self-supervised framework for image redundancy reduction called Learning to Rank Patches.
arXiv Detail & Related papers (2024-03-31T13:12:41Z)
Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z)
Simplified Concrete Dropout -- Improving the Generation of Attribution Masks for Fine-grained Classification [8.330791157878137]
Fine-grained classification models are often deployed to determine animal species or individuals in automated animal monitoring systems. Attention- or gradient-based methods are commonly used to identify regions in the image that contribute the most to the classification decision. This paper presents a solution to circumvent these computational instabilities by simplifying the CD sampling and reducing reliance on large mini-batch sizes.
arXiv Detail & Related papers (2023-07-27T13:01:49Z)
CEnt: An Entropy-based Model-agnostic Explainability Framework to Contrast Classifiers' Decisions [2.543865489517869]
We present a novel approach to locally contrast the prediction of any classifier. Our Contrastive Entropy-based explanation method, CEnt, approximates a model locally by a decision tree to compute entropy information of different feature splits. CEnt is the first non-gradient-based contrastive method generating diverse counterfactuals that do not necessarily exist in the training data while satisfying immutability (ex. race) and semi-immutability (ex. age can only change in an increasing direction)
arXiv Detail & Related papers (2023-01-19T08:23:34Z)
TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning [9.507070656654632]
We present Transformation Invariance and Covariance Contrast (TiCo) for self-supervised visual representation learning. Our method is based on maximizing the agreement among embeddings of different distorted versions of the same image. We show that TiCo can be viewed as a variant of MoCo with an implicit memory bank of unlimited size at no extra memory cost.
arXiv Detail & Related papers (2022-06-21T19:44:01Z)
Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models. Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)
Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies. We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.