Related papers: Zero-Training Temporal Drift Detection for Transformer Sentiment Models: A Comprehensive Analysis on Authentic Social Media Streams

Zero-Training Temporal Drift Detection for Transformer Sentiment Models: A Comprehensive Analysis on Authentic Social Media Streams

URL: http://arxiv.org/abs/2512.20631v1
Date: Sun, 30 Nov 2025 13:08:59 GMT
Title: Zero-Training Temporal Drift Detection for Transformer Sentiment Models: A Comprehensive Analysis on Authentic Social Media Streams
Authors: Aayam Bansal, Ishaan Gangwani,
Abstract summary: We present a comprehensive zero-training temporal drift analysis of transformer-based sentiment models validated on authentic social media data from major real-world events.<n>We demonstrate significant model instability with accuracy drops reaching 23.4% during event-driven periods.<n>This zero-training methodology enables immediate deployment for real-time sentiment monitoring systems and provides new insights into transformer model behavior during dynamic content periods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a comprehensive zero-training temporal drift analysis of transformer-based sentiment models validated on authentic social media data from major real-world events. Through systematic evaluation across three transformer architectures and rigorous statistical validation on 12,279 authentic social media posts, we demonstrate significant model instability with accuracy drops reaching 23.4% during event-driven periods. Our analysis reveals maximum confidence drops of 13.0% (Bootstrap 95% CI: [9.1%, 16.5%]) with strong correlation to actual performance degradation. We introduce four novel drift metrics that outperform embedding-based baselines while maintaining computational efficiency suitable for production deployment. Statistical validation across multiple events confirms robust detection capabilities with practical significance exceeding industry monitoring thresholds. This zero-training methodology enables immediate deployment for real-time sentiment monitoring systems and provides new insights into transformer model behavior during dynamic content periods.

Related papers

The GT-Score: A Robust Objective Function for Reducing Overfitting in Data-Driven Trading Strategies [51.56484100374058]
GT-Score is a composite objective function that integrates performance, statistical significance, consistency, and downside risk.<n>In walk-forward validation, GT-Score improves the generalization ratio by 98% relative to baseline objective functions.<n>These results suggest that embedding an anti-overfitting structure into the objective can improve the reliability of backtests in quantitative research.
arXiv Detail & Related papers (2026-01-22T05:16:47Z)
Flexible Gravitational-Wave Parameter Estimation with Transformers [73.44614054040267]
We introduce a flexible transformer-based architecture paired with a training strategy that enables adaptation to diverse analysis settings at inference time.<n>We demonstrate that a single flexible model -- called Dingo-T1 -- can analyze 48 gravitational-wave events from the third LIGO-Virgo-KAGRA Observing Run.
arXiv Detail & Related papers (2025-12-02T17:49:08Z)
Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z)
A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models [32.56983347493999]
Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs)<n>Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance.<n>This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets.
arXiv Detail & Related papers (2025-09-23T18:19:50Z)
Segmented Confidence Sequences and Multi-Scale Adaptive Confidence Segments for Anomaly Detection in Nonstationary Time Series [0.0]
We introduce and empirically evaluate two novel adaptive thresholding frameworks: Segmented Confidence Sequences (SCS) and Multi-Scale Adaptive Confidence Segments (MACS)<n>Our experiments across Wafer Manufacturing benchmark datasets show significant F1-score improvement compared to traditional percentile and rolling quantile approaches.
arXiv Detail & Related papers (2025-08-08T18:34:54Z)
MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection [47.86433139298671]
This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset.<n>The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic.<n>Traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture.
arXiv Detail & Related papers (2025-06-20T14:51:35Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.<n>Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.<n>Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Predictive Crash Analytics for Traffic Safety using Deep Learning [0.0]
This research presents an innovative approach to traffic safety analysis through the integration of ensemble learning methods and multi-modal data fusion.<n>Our primary contribution lies in developing a hierarchical severity classification system that combines spatial-temporal crash patterns with environmental conditions.<n>We introduce a novel feature engineering technique that integrates crash location data with incident reports and weather conditions, achieving 92.4% accuracy in risk prediction and 89.7% precision in hotspot identification.
arXiv Detail & Related papers (2025-02-09T05:00:46Z)
FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets [0.0]
We introduce FOVAL, a robust calibration-free approach to depth estimation.<n>Compared to Transformers, Temporal Contemporalal Networks (TCNs), and CNNs, FOVAL achieves superior performance.<n> Evaluations across three benchmark datasets using Leave-One-Out Cross-Validation (LOOCV) and cross-dataset validation show a mean absolute error (MAE) of 9.1 cm and strong generalisation without calibration.
arXiv Detail & Related papers (2024-08-07T07:09:14Z)
Prediction of SLAM ATE Using an Ensemble Learning Regression Model and 1-D Global Pooling of Data Characterization [3.4399698738841553]
We introduce a novel method for predicting SLAM localization error based on the characterization of raw sensor inputs. The proposed method relies on using a random forest regression model trained on 1-D global pooled features that are generated from characterized raw sensor data. The paper also studies the impact of 12 different 1-D global pooling functions on regression quality, and the superiority of 1-D global averaging is quantitatively proven.
arXiv Detail & Related papers (2023-03-01T16:12:47Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.