Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp
- URL: http://arxiv.org/abs/2104.09742v1
- Date: Tue, 20 Apr 2021 03:35:25 GMT
- Title: Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp
- Authors: Shuguang Chen, Leonardo Neves, and Thamar Solorio
- Abstract summary: Performance of neural models for named entity recognition degrades over time, becoming stale.
We propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training.
Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an attractive, practical solution.
- Score: 16.960138447997007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance of neural models for named entity recognition degrades over time,
becoming stale. This degradation is due to temporal drift, the change in our
target variables' statistical properties over time. This issue is especially
problematic for social media data, where topics change rapidly. In order to
mitigate the problem, data annotation and retraining of models is common.
Despite its usefulness, this process is expensive and time-consuming, which
motivates new research on efficient model updating. In this paper, we propose
an intuitive approach to measure the potential trendiness of tweets and use
this metric to select the most informative instances to use for training. We
conduct experiments on three state-of-the-art models on the Temporal Twitter
Dataset. Our approach shows larger increases in prediction accuracy with less
training data than the alternatives, making it an attractive, practical
solution.
Related papers
- A Cost-Aware Approach to Adversarial Robustness in Neural Networks [1.622320874892682]
We propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy.
We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously.
arXiv Detail & Related papers (2024-09-11T20:43:59Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt [37.98336090671441]
Concept textbfDrift textbfDetection antextbfD textbfAdaptation (D3A)
It first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption.
It helps mitigate the data distribution gap, a critical factor contributing to train-test performance inconsistency.
arXiv Detail & Related papers (2024-03-22T04:44:43Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation [84.82153655786183]
We propose a novel framework called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation.
IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training.
Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7%/55.4% on the GTA5/SYNTHIA to Cityscapes adaptation tasks.
arXiv Detail & Related papers (2023-09-25T15:56:01Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via
Rank Regression [17.684526928033065]
We introduce the Deep AFT Rank-regression model for Time-to-event prediction (DART)
This model uses an objective function based on Gehan's rank statistic, which is efficient and reliable for representation learning.
The proposed method is a semiparametric approach to AFT modeling that does not impose any distributional assumptions on the survival time distribution.
arXiv Detail & Related papers (2023-07-16T13:58:28Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - Continual Learning with Transformers for Image Classification [12.028617058465333]
In computer vision, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past.
We develop a solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning.
We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model.
arXiv Detail & Related papers (2022-06-28T15:30:10Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.