Towards Realistic Single-Task Continuous Learning Research for NER
- URL: http://arxiv.org/abs/2110.14694v1
- Date: Wed, 27 Oct 2021 18:23:31 GMT
- Title: Towards Realistic Single-Task Continuous Learning Research for NER
- Authors: Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil
Ramakrishna, Mukund Sridhar, Rahul Gupta
- Abstract summary: We discuss some of the unrealistic data characteristics of public datasets and study the challenges of realistic single-task continuous learning.
We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.
- Score: 19.61159414320659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is an increasing interest in continuous learning (CL), as data privacy
is becoming a priority for real-world machine learning applications. Meanwhile,
there is still a lack of academic NLP benchmarks that are applicable for
realistic CL settings, which is a major challenge for the advancement of the
field. In this paper we discuss some of the unrealistic data characteristics of
public datasets, study the challenges of realistic single-task continuous
learning as well as the effectiveness of data rehearsal as a way to mitigate
accuracy loss. We construct a CL NER dataset from an existing publicly
available dataset and release it along with the code to the research community.
Related papers
- What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Where is the Truth? The Risk of Getting Confounded in a Continual World [21.862370510786004]
A dataset is confounded if it is most easily solved via a spurious correlation, which fails to generalize to new data.
In a continual learning setting where confounders may vary in time across tasks, the challenge of mitigating the effect of confounders far exceeds the standard forgetting problem.
arXiv Detail & Related papers (2024-02-09T14:24:18Z) - Analysis of Knowledge Tracing performance on synthesised student data [3.9227982854973438]
Knowledge Tracing aims to predict the future performance of students by tracking the development of their knowledge states.
Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives.
Our work shows that using only synthetic data for training can lead to similar performance as real data.
arXiv Detail & Related papers (2024-01-30T09:19:50Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - On Handling Catastrophic Forgetting for Incremental Learning of Human
Physical Activity on the Edge [1.4695979686066065]
PILOTE pushes the incremental learning process to the extreme edge, while providing reliable data privacy and practical utility.
We validate PILOTE with extensive experiments on human activity data collected from mobile sensors.
arXiv Detail & Related papers (2023-02-18T11:55:01Z) - The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI.
We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts.
We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z) - Continual Learning for Recurrent Neural Networks: a Review and Empirical
Evaluation [12.27992745065497]
Continual Learning with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary.
We organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks.
We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications.
arXiv Detail & Related papers (2021-03-12T19:25:28Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.