Non-Parametric Temporal Adaptation for Social Media Topic Classification
- URL: http://arxiv.org/abs/2209.05706v2
- Date: Mon, 15 May 2023 22:17:39 GMT
- Title: Non-Parametric Temporal Adaptation for Social Media Topic Classification
- Authors: Fatemehsadat Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez,
Ahmed El-Kishky, Taylor Berg-Kirkpatrick
- Abstract summary: We study temporal adaptation through the task of longitudinal hashtag prediction.
Our method improves by 64.12% over the best parametric baseline without any of its costly gradient-based updating.
Our dense retrieval approach is also well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost and performance loss.
- Score: 41.52878699836363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: User-generated social media data is constantly changing as new trends
influence online discussion and personal information is deleted due to privacy
concerns. However, most current NLP models are static and rely on fixed
training data, which means they are unable to adapt to temporal change -- both
test distribution shift and deleted training data -- without frequent, costly
re-training. In this paper, we study temporal adaptation through the task of
longitudinal hashtag prediction and propose a non-parametric dense retrieval
technique, which does not require re-training, as a simple but effective
solution. In experiments on a newly collected, publicly available, year-long
Twitter dataset exhibiting temporal distribution shift, our method improves by
64.12% over the best parametric baseline without any of its costly
gradient-based updating. Our dense retrieval approach is also particularly
well-suited to dynamically deleted user data in line with data privacy laws,
with negligible computational cost and performance loss.
Related papers
- TCGU: Data-centric Graph Unlearning based on Transferable Condensation [36.670771080732486]
Transferable Condensation Graph Unlearning (TCGU) is a data-centric solution to zero-glance graph unlearning.
We show that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy than existing GU methods.
arXiv Detail & Related papers (2024-10-09T02:14:40Z) - Data Selection for Transfer Unlearning [14.967546081883034]
We advocate for a relaxed definition of unlearning that does not address privacy applications.
We propose a new method that uses a mechanism for selecting relevant examples from an auxiliary "static" dataset.
We find that our method outperforms the gold standard "exact unlearning" on several datasets.
arXiv Detail & Related papers (2024-05-16T20:09:41Z) - Online Tensor Inference [0.0]
Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data.
Existing low-rank tensor methods lack the capability for statistical inference in an online fashion.
Our approach employs Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements.
arXiv Detail & Related papers (2023-12-28T16:37:48Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - Fast Machine Unlearning Without Retraining Through Selective Synaptic
Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data.
We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z) - Transferable Unlearnable Examples [63.64357484690254]
Un unlearnable strategies have been introduced to prevent third parties from training on the data without permission.
They add perturbations to the users' data before publishing, which aims to make the models trained on the published dataset invalidated.
We propose a novel unlearnable strategy based on Classwise Separability Discriminant (CSD), which aims to better transfer the unlearnable effects to other training settings and datasets.
arXiv Detail & Related papers (2022-10-18T19:23:52Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Training for the Future: A Simple Gradient Interpolation Loss to
Generalize Along Time [26.261277497201565]
In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time.
We propose a simple method that starts with a model with time-sensitive parameters but regularizes its temporal complexity using a Gradient Interpolation (GI) loss.
arXiv Detail & Related papers (2021-08-15T11:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.