Related papers: Methods for Generating Drift in Text Streams

Methods for Generating Drift in Text Streams

URL: http://arxiv.org/abs/2403.12328v1
Date: Mon, 18 Mar 2024 23:48:33 GMT
Title: Methods for Generating Drift in Text Streams
Authors: Cristiano Mesquita Garcia, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr, Jean Paul Barddal,
Abstract summary: Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
Score: 49.3179290313959
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Systems and individuals produce data continuously. On the Internet, people share their knowledge, sentiments, and opinions, provide reviews about services and products, and so on. Automatically learning from these textual data can provide insights to organizations and institutions, thus preventing financial impacts, for example. To learn from textual data over time, the machine learning system must account for concept drift. Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. For instance, a concept drift occurs when sentiments change or a word's meaning is adjusted over time. Although concept drift is frequent in real-world applications, benchmark datasets with labeled drifts are rare in the literature. To bridge this gap, this paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. These methods were applied to Yelp and Airbnb datasets and tested using incremental classifiers respecting the stream mining paradigm to evaluate their ability to recover from the drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels regarding accuracy and Macro F1-Score.

Related papers

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time [5.999777817331315]
Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time. We propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations.
arXiv Detail & Related papers (2024-06-24T23:41:46Z)
Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z)
Explaining Drift using Shapley Values [0.0]
Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained. There is no framework to identify the drivers behind the drift in model performance. We propose a novel framework - DBShap that uses principled Shapley values to identify the main contributors of the drift.
arXiv Detail & Related papers (2024-01-18T07:07:42Z)
Concept Drift Adaptation in Text Stream Mining Settings: A Systematic Review [46.543216927386005]
This study presents a systematic literature review regarding concept drift adaptation in text stream scenarios. We selected 48 papers published between 2018 and August 2024 to unravel aspects such as text drift categories, detection types, model update mechanisms, stream mining tasks addressed, and text representation methods and their update mechanisms.
arXiv Detail & Related papers (2023-12-05T17:15:16Z)
A comprehensive analysis of concept drift locality in data streams [3.5897534810405403]
Concept drift must be detected for effective model adaptation to evolving data properties. We present a novel categorization of concept drift based on its locality and scale. We conduct a comparative assessment of 9 state-of-the-art drift detectors across diverse difficulties.
arXiv Detail & Related papers (2023-11-10T20:57:43Z)
Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision. We show that it is equally important to ensure that the accumulated embeddings are up to date. In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z)
Are Concept Drift Detectors Reliable Alarming Systems? -- A Comparative Study [6.7961908135481615]
Concept drift, also known as concept drift, impacts the performance of machine learning models. In this study, we assess the reliability of concept drift detectors to identify drift in time. Our findings aim to help practitioners understand which drift detector should be employed in different situations.
arXiv Detail & Related papers (2022-11-23T16:31:15Z)
Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time. The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
Concept drift detection and adaptation for federated and continual learning [55.41644538483948]
Smart devices can collect vast amounts of data from their environment. This data is suitable for training machine learning models, which can significantly improve their behavior. In this work, we present a new method, called Concept-Drift-Aware Federated Averaging.
arXiv Detail & Related papers (2021-05-27T17:01:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.