Changepoint Analysis of Topic Proportions in Temporal Text Data
- URL: http://arxiv.org/abs/2112.00827v1
- Date: Mon, 29 Nov 2021 17:20:51 GMT
- Title: Changepoint Analysis of Topic Proportions in Temporal Text Data
- Authors: Avinandan Bose, Soumendu Sundar Mukherjee
- Abstract summary: We build a specialised temporal topic model with provisions for changepoints in the distribution of topic proportions.
We use sample splitting to estimate topic polytopes first and then apply a likelihood ratio statistic.
We obtain some historically well-known changepoints and discover some new ones.
- Score: 1.8262547855491456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Changepoint analysis deals with unsupervised detection and/or estimation of
time-points in time-series data, when the distribution generating the data
changes. In this article, we consider \emph{offline} changepoint detection in
the context of large scale textual data. We build a specialised temporal topic
model with provisions for changepoints in the distribution of topic
proportions. As full likelihood based inference in this model is
computationally intractable, we develop a computationally tractable approximate
inference procedure. More specifically, we use sample splitting to estimate
topic polytopes first and then apply a likelihood ratio statistic together with
a modified version of the wild binary segmentation algorithm of Fryzlewicz et
al. (2014). Our methodology facilitates automated detection of structural
changes in large corpora without the need of manual processing by domain
experts. As changepoints under our model correspond to changes in topic
structure, the estimated changepoints are often highly interpretable as marking
the surge or decline in popularity of a fashionable topic. We apply our
procedure on two large datasets: (i) a corpus of English literature from the
period 1800-1922 (Underwoodet al., 2015); (ii) abstracts from the High Energy
Physics arXiv repository (Clementet al., 2019). We obtain some historically
well-known changepoints and discover some new ones.
Related papers
- Evolving Voices Based on Temporal Poisson Factorisation [0.0]
We propose the temporal Poisson factorisation (TPF) model as an extension to the factorisation model to model sparse count data matrices.
We discuss in detail results of the TPF model when analysing speeches from 18 sessions in the U.S. Senate (1981-2016)
arXiv Detail & Related papers (2024-10-24T07:21:33Z) - Causal Discovery-Driven Change Point Detection in Time Series [32.424281626708336]
Change point detection in time series seeks to identify times when the probability distribution of time series changes.
In practical applications, we may be interested only in certain components of the time series, exploring abrupt changes in their distributions.
arXiv Detail & Related papers (2024-07-10T00:54:42Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - Deep learning model solves change point detection for multiple change
types [69.77452691994712]
A change points detection aims to catch an abrupt disorder in data distribution.
We propose an approach that works in the multiple-distributions scenario.
arXiv Detail & Related papers (2022-04-15T09:44:21Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Time Series Analysis via Network Science: Concepts and Algorithms [62.997667081978825]
This review provides a comprehensive overview of existing mapping methods for transforming time series into networks.
We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified notation and language.
Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.
arXiv Detail & Related papers (2021-10-11T13:33:18Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Topic Scaling: A Joint Document Scaling -- Topic Model Approach To Learn
Time-Specific Topics [0.0]
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions.
The first stage ranks documents using Wordfish to estimate document positions that serve as a dependent variable to learn relevant topics.
The second stage ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution.
arXiv Detail & Related papers (2021-03-31T12:35:36Z) - Interpretable Feature Construction for Time Series Extrinsic Regression [0.028675177318965035]
In some application domains, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER)
We suggest an extension of a Bayesian method for robust and interpretable feature construction and selection in the context of TSER.
Our approach exploits a relational way to tackle with TSER: (i), we build various and simple representations of the time series which are stored in a relational data scheme, then, (ii), a propositionalisation technique is applied to build interpretable features from secondary tables to "flatten" the data.
arXiv Detail & Related papers (2021-03-15T08:12:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.