ttta: Tools for Temporal Text Analysis
- URL: http://arxiv.org/abs/2503.02625v1
- Date: Tue, 04 Mar 2025 13:50:21 GMT
- Title: ttta: Tools for Temporal Text Analysis
- Authors: Kai-Robin Lange, Niklas Benner, Lars Grönberg, Aymane Hachcham, Imene Kolli, Jonas Rieger, Carsten Jentsch,
- Abstract summary: Most NLP techniques consider the corpus at hand to be homogenous in regard to time.<n>This is a simplification that can lead to biased results, as the meaning of words and phrases can change over time.<n>The ttta package is supposed to serve as a collection of tools for analyzing text data over time.
- Score: 0.48163317476588563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text data is inherently temporal. The meaning of words and phrases changes over time, and the context in which they are used is constantly evolving. This is not just true for social media data, where the language used is rapidly influenced by current events, memes and trends, but also for journalistic, economic or political text data. Most NLP techniques however consider the corpus at hand to be homogenous in regard to time. This is a simplification that can lead to biased results, as the meaning of words and phrases can change over time. For instance, running a classic Latent Dirichlet Allocation on a corpus that spans several years is not enough to capture changes in the topics over time, but only portraits an "average" topic distribution over the whole time span. Researchers have developed a number of tools for analyzing text data over time. However, these tools are often scattered across different packages and libraries, making it difficult for researchers to use them in a consistent and reproducible way. The ttta package is supposed to serve as a collection of tools for analyzing text data over time.
Related papers
- Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models [5.2980803808373516]
Large Language Models (LLMs) have been applied to time series forecasting tasks, leveraging pre-trained language models as the backbone.
This study seeks to investigate the actual efficacy and interpretability of such textual incorporations.
arXiv Detail & Related papers (2025-04-09T02:48:35Z) - Statistical Analysis of Sentence Structures through ASCII, Lexical Alignment and PCA [0.0]
It proposes a novel statistical method that uses American Standard Code for Information Interchange (ASCII) codes to represent text of 11 text corpora.
It analyzes the results through histograms and normality tests such as Shapiro-Wilk and Anderson-Darling Tests.
arXiv Detail & Related papers (2025-03-13T15:42:44Z) - Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval [56.05621657583251]
Cross-modal (e.g. image-text, video-text) retrieval is an important task in information retrieval and multimodal vision-language understanding field.<n>We introduce RTime, a novel temporal-emphasized video-text retrieval dataset.<n>Our RTime dataset currently consists of 21k videos with 10 captions per video, totalling about 122 hours.
arXiv Detail & Related papers (2024-12-26T11:32:00Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Tweet Insights: A Visualization Platform to Extract Temporal Insights
from Twitter [19.591692602304494]
This paper introduces a large collection of time series data derived from Twitter.
This data comprises the past five years and captures changes in n-gram frequency, similarity, sentiment and topic distribution.
The interface built on top of this data enables temporal analysis for detecting and characterizing shifts in meaning.
arXiv Detail & Related papers (2023-08-04T05:39:26Z) - Text2Time: Transformer-based Article Time Period Prediction [0.11470070927586018]
This work investigates the problem of predicting the publication period of a text document, specifically a news article, based on its textual content.
We create our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades.
In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.
arXiv Detail & Related papers (2023-04-21T10:05:03Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Time Masking for Temporal Language Models [23.08079115356717]
We propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts.
Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information.
arXiv Detail & Related papers (2021-10-12T21:15:23Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Generalized Word Shift Graphs: A Method for Visualizing and Explaining
Pairwise Comparisons Between Texts [0.15833270109954134]
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content.
We introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts.
We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences.
arXiv Detail & Related papers (2020-08-05T17:27:11Z) - Local-Global Video-Text Interactions for Temporal Grounding [77.5114709695216]
This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query.
We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query.
The proposed method effectively predicts the target time interval by exploiting contextual information from local to global.
arXiv Detail & Related papers (2020-04-16T08:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.