Data Augmentation as Free Lunch: Exploring the Test-Time Augmentation for Sequential Recommendation
- URL: http://arxiv.org/abs/2504.04843v3
- Date: Wed, 30 Apr 2025 07:43:41 GMT
- Title: Data Augmentation as Free Lunch: Exploring the Test-Time Augmentation for Sequential Recommendation
- Authors: Yizhou Dang, Yuting Liu, Enneng Yang, Minhan Huang, Guibing Guo, Jianzhe Zhao, Xingwei Wang,
- Abstract summary: Existing methods generate new yet effective data during model training to improve performance.<n> deploying them requires retraining, architecture modification, or introducing additional learnable parameters.<n>We explore the test-time augmentation (TTA) for sequential recommendation, which augments the inputs during the model inference and then aggregates the model's predictions for augmented data to improve final accuracy.
- Score: 9.55994486328914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation has become a promising method of mitigating data sparsity in sequential recommendation. Existing methods generate new yet effective data during model training to improve performance. However, deploying them requires retraining, architecture modification, or introducing additional learnable parameters. The above steps are time-consuming and costly for well-trained models, especially when the model scale becomes large. In this work, we explore the test-time augmentation (TTA) for sequential recommendation, which augments the inputs during the model inference and then aggregates the model's predictions for augmented data to improve final accuracy. It avoids significant time and cost overhead from loss calculation and backward propagation. We first experimentally disclose the potential of existing augmentation operators for TTA and find that the Mask and Substitute consistently achieve better performance. Further analysis reveals that these two operators are effective because they retain the original sequential pattern while adding appropriate perturbations. Meanwhile, we argue that these two operators still face time-consuming item selection or interference information from mask tokens. Based on the analysis and limitations, we present TNoise and TMask. The former injects uniform noise into the original representation, avoiding the computational overhead of item selection. The latter blocks mask token from participating in model calculations or directly removes interactions that should have been replaced with mask tokens. Comprehensive experiments demonstrate the effectiveness, efficiency, and generalizability of our method. We provide an anonymous implementation at https://github.com/KingGugu/TTA4SR.
Related papers
- Towards Pattern-aware Data Augmentation for Temporal Knowledge Graph Completion [18.51546761241817]
We introduce Booster, the first data augmentation strategy for temporal knowledge graphs.<n>We propose a hierarchical scoring algorithm based on triadic closures within TKGs.<n>We also propose a two-stage training approach to identify samples that deviate from the model's preferred patterns.
arXiv Detail & Related papers (2024-12-31T03:47:19Z) - Hessian-Free Online Certified Unlearning [8.875278412741695]
We develop an online unlearning algorithm that achieves near-instantaneous data removal.<n>We prove that our proposed method outperforms the state-of-the-art methods in terms of the unlearning and generalization guarantees.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Exploring Bayesian Surprise to Prevent Overfitting and to Predict Model
Performance in Non-Intrusive Load Monitoring [25.32973996508579]
Non-Intrusive Load Monitoring (NILM) is a field of research focused on segregating constituent electrical loads in a system based only on their aggregated signal.
We quantify the degree of surprise between the predictive distribution (termed postdictive surprise) and the transitional probabilities (termed transitional surprise)
This work provides clear evidence that a point of diminishing returns of model performance with respect to dataset size exists.
arXiv Detail & Related papers (2020-09-16T15:39:08Z) - Learning Transition Models with Time-delayed Causal Relations [17.494609199646813]
The proposed algorithm initially predicts observations with the Markov assumption.
hidden variables are memory units that keep track of pertinent past events.
Experiments on simulated and real robotic tasks show that this method significantly improves over current techniques.
arXiv Detail & Related papers (2020-08-04T14:35:11Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.