Related papers: Rolling with the Punches: Resilient Contrastive Pre-training under Non-Stationary Drift

Rolling with the Punches: Resilient Contrastive Pre-training under Non-Stationary Drift

URL: http://arxiv.org/abs/2502.07620v2
Date: Mon, 19 May 2025 13:59:05 GMT
Title: Rolling with the Punches: Resilient Contrastive Pre-training under Non-Stationary Drift
Authors: Xiaoyu Yang, Jie Lu, En Yu,
Abstract summary: A critical emerging challenge is the effective pre-training of models on dynamic data streams.<n>We first reveal that conventional contrastive pre-training methods are notably vulnerable to concept drift.<n>We propose Resilient Contrastive Pre-training (RCP), a novel method incorporating causal intervention.
Score: 16.97188816362991
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The remarkable success of large-scale contrastive pre-training, fueled by vast and curated datasets, is encountering new frontiers as the scaling paradigm evolves. A critical emerging challenge is the effective pre-training of models on dynamic data streams characterized by concept drift, unpredictable changes in the underlying data distribution. This paper undertakes a foundational investigation of this issue. We first reveal that conventional contrastive pre-training methods are notably vulnerable to concept drift, leading to significant biases in the learned feature space of pre-trained models. To systematically analyze these effects, we construct a structural causal model that elucidates how drift acts as a confounder, distorting learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a novel method incorporating causal intervention. RCP introduces a causally-informed objective designed to mitigate drift-induced biases by leveraging targeted interventions. RCP is designed for simple and scalable implementation and exhibits notable adaptability, promoting robust pre-training on evolving data. Comprehensive experiments across diverse downstream tasks compellingly demonstrate that RCP effectively alleviates the detrimental impact of concept drift, yielding more resilient and generalizable representations.

Related papers

Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning [16.97188816362991]
This paper uncovers a critical yet overlooked phenomenon in multi-modal large language models (MLLMs)<n>We are pioneers in establishing the theoretical bridge between concept drift theory and RFT processes.<n>We propose a novel counterfact-aware RFT that systematically decouples beneficial distribution adaptation from harmful concept drift.
arXiv Detail & Related papers (2025-05-19T13:13:38Z)
Trajectory Entropy Reinforcement Learning for Predictable and Robust Control [12.289021814766539]
We introduce a novel inductive bias towards simple policies in reinforcement learning.<n>The simplicity inductive bias is introduced by minimizing the entropy of entire action trajectories.<n>We show that our learned policies produce more cyclical and consistent action trajectories.
arXiv Detail & Related papers (2025-05-07T07:41:29Z)
datadriftR: An R Package for Concept Drift Detection in Predictive Models [0.0]
This paper introduces drifter, an R package designed to detect concept drift.<n>It proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift.
arXiv Detail & Related papers (2024-12-15T20:59:49Z)
Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models. Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML) We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z)
Online Drift Detection with Maximum Concept Discrepancy [13.48123472458282]
We propose MCD-DD, a novel concept drift detection method based on maximum concept discrepancy. Our method can adaptively identify varying forms of concept drift by contrastive learning of concept embeddings.
arXiv Detail & Related papers (2024-07-07T13:57:50Z)
Liquid Neural Network-based Adaptive Learning vs. Incremental Learning for Link Load Prediction amid Concept Drift due to Network Failures [37.66676003679306]
Adapting to concept drift is a challenging task in machine learning. In communication networks, such issue emerges when performing traffic forecasting following afailure event. We propose an approach that exploits adaptive learning algorithms, namely, liquid neural networks, which are capable of self-adaptation to abrupt changes in data patterns without requiring any retraining.
arXiv Detail & Related papers (2024-04-08T08:47:46Z)
Initialization Matters for Adversarial Transfer Learning [61.89451332757625]
We discover the necessity of an adversarially robust pretrained model. We propose Robust Linear Initialization (RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing. Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.
arXiv Detail & Related papers (2023-12-10T00:51:05Z)
MemDA: Forecasting Urban Time Series with Memory-based Drift Adaptation [24.284969264008733]
We propose a new urban time series prediction model for the concept drift problem, which encodes the drift by considering the periodicity in the data. Our design significantly outperforms state-of-the-art methods and can be well generalized to existing prediction backbones.
arXiv Detail & Related papers (2023-09-25T15:22:28Z)
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z)
Fine-tuning can cripple your foundation model; preserving features may be the solution [87.35911633187204]
A fine-tuned model's ability to recognize concepts on tasks is reduced significantly compared to its pre-trained counterpart. We propose a new fine-tuning method called $textitLDIFS$ that, while learning new concepts related to the downstream task, allows a model to preserve its pre-trained knowledge as well.
arXiv Detail & Related papers (2023-08-25T11:49:51Z)
Alleviating the Effect of Data Imbalance on Adversarial Training [26.36714114672729]
We study adversarial training on datasets that obey the long-tailed distribution. We propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT)
arXiv Detail & Related papers (2023-07-14T07:01:48Z)
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z)
FLARE: Detection and Mitigation of Concept Drift for Federated Learning based IoT Deployments [2.7776688429637466]
FLARE is a lightweight dual-scheduler FL framework that conditionally transfers training data and deploys models between edge and sensor endpoints. We show that FLARE can significantly reduce the amount of data exchanged between edge and sensor nodes compared to fixed-interval scheduling methods. It can successfully detect concept drift reactively with at least a 16x reduction in latency.
arXiv Detail & Related papers (2023-05-15T10:09:07Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Unsupervised Unlearning of Concept Drift with Autoencoders [5.41354952642957]
Concept drift refers to a change in the data distribution affecting the data stream of future samples. This paper proposes an unsupervised and model-agnostic concept drift adaptation method at the global level.
arXiv Detail & Related papers (2022-11-23T14:52:49Z)
Rethinking Importance Weighting for Transfer Learning [71.81262398144946]
Key assumption in supervised learning is that training and test data follow the same probability distribution. As real-world machine learning tasks are becoming increasingly complex, novel approaches are explored to cope with such challenges.
arXiv Detail & Related papers (2021-12-19T14:35:25Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Pre-training also Transfers Non-Robustness [20.226917627173126]
In spite of its recognized contribution to generalization, pre-training also transfers the non-robustness from pre-trained model into the fine-tuned model. Results validate the effectiveness in alleviating non-robustness and preserving generalization.
arXiv Detail & Related papers (2021-06-21T11:16:13Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.