Related papers: Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification

URL: http://arxiv.org/abs/2511.22977v1
Date: Fri, 28 Nov 2025 08:32:49 GMT
Title: Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
Authors: Sumit Mamtani, Abhijeet Bhure,
Abstract summary: BERT embeddings combined with logistic regression outperform neural baselines on LIAR dataset splits.<n>This work positions attention-based token encoders as robust, architecture-centric foundations for veracity tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates fake news detection as a downstream evaluation of Transformer representations, benchmarking encoder-only and decoder-only pre-trained models (BERT, GPT-2, Transformer-XL) as frozen embedders paired with lightweight classifiers. Through controlled preprocessing comparing pooling versus padding and neural versus linear heads, results demonstrate that contextual self-attention encodings consistently transfer effectively. BERT embeddings combined with logistic regression outperform neural baselines on LIAR dataset splits, while analyses of sequence length and aggregation reveal robustness to truncation and advantages from simple max or average pooling. This work positions attention-based token encoders as robust, architecture-centric foundations for veracity tasks, isolating Transformer contributions from classifier complexity.

Related papers

Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers [9.617245548268437]
Large Language Models (LLMs) are trained with next-token prediction, implemented in autoregressive Transformers.<n>This creates a subtle misalignment: residual connections tie activations to the current token, while supervision targets the next token.<n>We propose a lightweight residual-path mitigation based on residual attenuation, implemented either as a fixed-layer intervention or as a learnable gating mechanism.
arXiv Detail & Related papers (2026-02-16T14:04:42Z)
A Transformer Inspired AI-based MIMO receiver [0.5039813366558306]
The AttDet design combines model-based interpretability with data-driven flexibility.<n>We demonstrate through link-level simulations under 5G channel models and high-order, mixed QAM modulation and coding schemes.<n>AttDet can approach near-optimal BER/BLER performance while maintaining predictable, realistic complexity.
arXiv Detail & Related papers (2025-10-23T09:05:10Z)
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
Influence-Guided Concolic Testing of Transformer Robustness [2.78712872004245]
Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions.<n>We present an influence-guided concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output.
arXiv Detail & Related papers (2025-09-28T11:09:15Z)
Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs [0.0]
This work considers problems where a subset of the input vectors contains requisite information for a downstream task (signal) while the rest are distractors (noise)<n>Standard methods used to aggregate transformer outputs, AvgPool, MaxPool, and ClsToken, are vulnerable to performance collapse as the signal-to-noise ratio (SNR) of inputs fluctuates.<n>We show that an attention-based adaptive pooling method can approximate the signal-optimal vector quantizer within derived error bounds for any SNR.
arXiv Detail & Related papers (2025-06-10T20:18:32Z)
TPN: Transferable Proto-Learning Network towards Few-shot Document-Level Relation Extraction [9.4094500796859]
Few-shot document-level relation extraction suffers from poor performance due to cross-domain transferability of NOTA relation representation. We introduce a Transferable Proto-Learning Network (TPN) to address the challenging issue. TPN comprises three core components: Hybrid hierarchically encodes semantic content of input text combined with attention information to enhance the relation representations.
arXiv Detail & Related papers (2024-10-01T05:37:31Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction [68.34355552090103]
This paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples. We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions. Experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios.
arXiv Detail & Related papers (2024-04-02T19:14:23Z)
SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression [6.74412860849373]
We propose SRFormer, a unified DETR-based model with amalgamated and Regression. Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers. Our method's exceptional robustness, superior training and data efficiency, as well as its state-of-the-art performance.
arXiv Detail & Related papers (2023-08-21T07:34:31Z)
Remote Sensing Change Detection With Transformers Trained from Scratch [62.96911491252686]
transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark. We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks.
arXiv Detail & Related papers (2023-04-13T17:57:54Z)
Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.