Related papers: News Signals: An NLP Library for Text and Time Series

News Signals: An NLP Library for Text and Time Series

URL: http://arxiv.org/abs/2312.11399v1
Date: Mon, 18 Dec 2023 18:02:41 GMT
Title: News Signals: An NLP Library for Text and Time Series
Authors: Chris Hokamp and Demian Gholipour Ghalandari and Parsa Ghaffari
Abstract summary: News Signals is an open-source library for building and using datasets where inputs are clusters of textual data. It supports diverse data science and NLP problem settings related to the prediction of time series behaviour.
Score: 3.850666668546735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an open-source Python library for building and using datasets where inputs are clusters of textual data, and outputs are sequences of real values representing one or more time series signals. The news-signals library supports diverse data science and NLP problem settings related to the prediction of time series behaviour using textual data feeds. For example, in the news domain, inputs are document clusters corresponding to daily news articles about a particular entity, and targets are explicitly associated real-valued time series: the volume of news about a particular person or company, or the number of pageviews of specific Wikimedia pages. Despite many industry and research use cases for this class of problem settings, to the best of our knowledge, News Signals is the only open-source library designed specifically to facilitate data science and research settings with natural language inputs and time series targets. In addition to the core codebase for building and interacting with datasets, we also conduct a suite of experiments using several popular Machine Learning libraries, which are used to establish baselines for time series anomaly prediction using textual inputs.

Related papers

Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) can be plugged into any existing numerical-only time series models.<n>We show that TaTS can enhance predictive performance without modifying model architectures.
arXiv Detail & Related papers (2025-02-13T03:43:27Z)
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data [26.300515935897415]
ChatTime is a unified framework for time series and text processing. As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability. We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios.
arXiv Detail & Related papers (2024-12-16T02:04:06Z)
Timeseria: an object-oriented time series processing library [0.40964539027092917]
Timeseria is an object-oriented time series processing library implemented in Python. It aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it.
arXiv Detail & Related papers (2024-10-12T15:29:18Z)
A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data and Information Retrieval in NLP [0.0]
We present a new deep learning supervised method for detecting events in time series data. It is based on regression instead of binary classification. It does not require labeled datasets where each point is labeled. It only requires reference events defined as time points or intervals of time.
arXiv Detail & Related papers (2023-10-25T09:13:19Z)
TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine [91.3755431537592]
TemporAI is an open source Python software library for machine learning (ML) tasks involving data with a time component. It supports data in time series, static, and eventmodalities and provides an interface for prediction, causal inference, and time-to-event analysis.
arXiv Detail & Related papers (2023-01-28T17:57:53Z)
SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts. SCROLLS contains summarization, question answering, and natural language inference tasks. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z)
Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields. Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP. The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z)
A Framework for Neural Topic Modeling of Text Corpora [6.340447411058068]
We introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset.
arXiv Detail & Related papers (2021-08-19T23:32:38Z)
Documenting the English Colossal Clean Crawled Corpus [28.008953329187648]
This work provides the first documentation for the Colossal Clean Crawled Corpus (C4; Raffel et al., 2020), a dataset created by applying a set of filters to a single snapshot of Common Crawl. We begin with a high-level summary of the data, including distributions of where the text came from and when it was written. We then give more detailed analysis on salient parts of this data, including the most frequent sources of text.
arXiv Detail & Related papers (2021-04-18T07:42:52Z)
GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns [25.350957495556226]
We provide a Python library for GrASP, an algorithm for drawing patterns from textual data. The library is equipped with a web-based interface empowering human users to conveniently explore the data and the extracted patterns.
arXiv Detail & Related papers (2021-04-08T17:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.