Concept Drift Adaptation in Text Stream Mining Settings: A Comprehensive
Review
- URL: http://arxiv.org/abs/2312.02901v1
- Date: Tue, 5 Dec 2023 17:15:16 GMT
- Title: Concept Drift Adaptation in Text Stream Mining Settings: A Comprehensive
Review
- Authors: Cristiano Mesquita Garcia and Ramon Simoes Abilio and Alessandro
Lameiras Koerich and Alceu de Souza Britto Jr. and Jean Paul Barddal
- Abstract summary: This study performed a systematic literature review regarding concept drift adaptation in text stream scenarios.
We selected 40 papers to unravel aspects such as text drift categories, types of text drift detection, model update mechanism, the addressed stream mining tasks, types of text representations, and text representation update mechanism.
- Score: 49.3179290313959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the advent and increase in the popularity of the Internet, people have
been producing and disseminating textual data in several ways, such as reviews,
social media posts, and news articles. As a result, numerous researchers have
been working on discovering patterns in textual data, especially because social
media posts function as social sensors, indicating peoples' opinions,
interests, etc. However, most tasks regarding natural language processing are
addressed using traditional machine learning methods and static datasets. This
setting can lead to several problems, such as an outdated dataset, which may
not correspond to reality, and an outdated model, which has its performance
degrading over time. Concept drift is another aspect that emphasizes these
issues, which corresponds to data distribution and pattern changes. In a text
stream scenario, it is even more challenging due to its characteristics, such
as the high speed and data arriving sequentially. In addition, models for this
type of scenario must adhere to the constraints mentioned above while learning
from the stream by storing texts for a limited time and consuming low memory.
In this study, we performed a systematic literature review regarding concept
drift adaptation in text stream scenarios. Considering well-defined criteria,
we selected 40 papers to unravel aspects such as text drift categories, types
of text drift detection, model update mechanism, the addressed stream mining
tasks, types of text representations, and text representation update mechanism.
In addition, we discussed drift visualization and simulation and listed
real-world datasets used in the selected papers. Therefore, this paper
comprehensively reviews the concept drift adaptation in text stream mining
scenarios.
Related papers
- Evolving Text Data Stream Mining [2.28438857884398]
A massive amount of such text data is generated by online social platforms every day.
Learning useful information from such streaming data under the constraint of limited time and memory has gained increasing attention.
New learning models are proposed for clustering and multi-label learning on text streams.
arXiv Detail & Related papers (2024-08-15T15:38:52Z) - Methods for Generating Drift in Text Streams [49.3179290313959]
Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time.
This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts.
Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
arXiv Detail & Related papers (2024-03-18T23:48:33Z) - Contextualized Diffusion Models for Text-Guided Image and Video Generation [67.69171154637172]
Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing.
We propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample.
We generalize our model to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing.
arXiv Detail & Related papers (2024-02-26T15:01:16Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - One or Two Things We know about Concept Drift -- A Survey on Monitoring
Evolving Environments [7.0072935721154614]
This paper provides a literature review focusing on concept drift in unsupervised data streams.
This setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering.
There is a section on the emerging topic of explaining concept drift.
arXiv Detail & Related papers (2023-10-24T13:25:19Z) - Temporal Perceiving Video-Language Pre-training [112.1790287726804]
This work introduces a novel text-video localization pre-text task to enable fine-grained temporal and semantic alignment.
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description.
Our method connects the fine-grained frame representations with the word representations and implicitly distinguishes representations of different instances in the single modality.
arXiv Detail & Related papers (2023-01-18T12:15:47Z) - An Overview on Controllable Text Generation via Variational
Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans.
Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data.
This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.