Related papers: Learning to Adapt to Position Bias in Vision Transformer Classifiers

Learning to Adapt to Position Bias in Vision Transformer Classifiers

URL: http://arxiv.org/abs/2505.13137v1
Date: Mon, 19 May 2025 14:07:36 GMT
Title: Learning to Adapt to Position Bias in Vision Transformer Classifiers
Authors: Robert-Jan Bruintjes, Jan van Gemert,
Abstract summary: We show that position bias plays a crucial role in the performance of Vision Transformers image classifiers.<n>We show various levels of position bias in different datasets, and find that the optimal choice of position embedding depends on the position bias apparent in the dataset.
Score: 10.210145452318041
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How discriminative position information is for image classification depends on the data. On the one hand, the camera position is arbitrary and objects can appear anywhere in the image, arguing for translation invariance. At the same time, position information is key for exploiting capture/center bias, and scene layout, e.g.: the sky is up. We show that position bias, the level to which a dataset is more easily solved when positional information on input features is used, plays a crucial role in the performance of Vision Transformers image classifiers. To investigate, we propose Position-SHAP, a direct measure of position bias by extending SHAP to work with position embeddings. We show various levels of position bias in different datasets, and find that the optimal choice of position embedding depends on the position bias apparent in the dataset. We therefore propose Auto-PE, a single-parameter position embedding extension, which allows the position embedding to modulate its norm, enabling the unlearning of position information. Auto-PE combines with existing PEs to match or improve accuracy on classification datasets.

Related papers

Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension [47.792435921037274]
This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states.
arXiv Detail & Related papers (2024-06-04T17:55:38Z)
Click-Conversion Multi-Task Model with Position Bias Mitigation for Sponsored Search in eCommerce [51.211924408864355]
We propose two position-bias-free prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE) Experiments on the E-commerce sponsored product search dataset show that our proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction.
arXiv Detail & Related papers (2023-07-29T19:41:16Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)
The Curious Case of Absolute Position Embeddings [65.13827063579728]
Transformer language models encode the notion of word order using positional information. In natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. We observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information.
arXiv Detail & Related papers (2022-10-23T00:00:04Z)
Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features [75.62755703738696]
Recent studies show that paddings in convolutional neural networks encode absolute position information. Existing metrics for quantifying the strength of positional information remain unreliable. We propose novel metrics for measuring (and visualizing) the encoded positional information.
arXiv Detail & Related papers (2022-06-02T17:59:57Z)
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings [33.87449556591022]
We propose an augmentation-based approach (CAPE) for absolute positional embeddings. CAPE keeps the advantages of both absolute (simplicity and speed) and relative position embeddings (better generalization)
arXiv Detail & Related papers (2021-06-06T14:54:55Z)
Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images. Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views. We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)
How Can CNNs Use Image Position for Segmentation? [23.98839374194848]
A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs. However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified.
arXiv Detail & Related papers (2020-05-07T13:38:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.