News Signals: An NLP Library for Text and Time Series
- URL: http://arxiv.org/abs/2312.11399v1
- Date: Mon, 18 Dec 2023 18:02:41 GMT
- Title: News Signals: An NLP Library for Text and Time Series
- Authors: Chris Hokamp and Demian Gholipour Ghalandari and Parsa Ghaffari
- Abstract summary: News Signals is an open-source library for building and using datasets where inputs are clusters of textual data.
It supports diverse data science and NLP problem settings related to the prediction of time series behaviour.
- Score: 3.850666668546735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an open-source Python library for building and using datasets
where inputs are clusters of textual data, and outputs are sequences of real
values representing one or more time series signals. The news-signals library
supports diverse data science and NLP problem settings related to the
prediction of time series behaviour using textual data feeds. For example, in
the news domain, inputs are document clusters corresponding to daily news
articles about a particular entity, and targets are explicitly associated
real-valued time series: the volume of news about a particular person or
company, or the number of pageviews of specific Wikimedia pages. Despite many
industry and research use cases for this class of problem settings, to the best
of our knowledge, News Signals is the only open-source library designed
specifically to facilitate data science and research settings with natural
language inputs and time series targets. In addition to the core codebase for
building and interacting with datasets, we also conduct a suite of experiments
using several popular Machine Learning libraries, which are used to establish
baselines for time series anomaly prediction using textual inputs.
Related papers
- ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data [26.300515935897415]
ChatTime is a unified framework for time series and text processing.
As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability.
We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios.
arXiv Detail & Related papers (2024-12-16T02:04:06Z) - Timeseria: an object-oriented time series processing library [0.40964539027092917]
Timeseria is an object-oriented time series processing library implemented in Python.
It aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it.
arXiv Detail & Related papers (2024-10-12T15:29:18Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - A Comprehensive Python Library for Deep Learning-Based Event Detection
in Multivariate Time Series Data and Information Retrieval in NLP [0.0]
We present a new deep learning supervised method for detecting events in time series data.
It is based on regression instead of binary classification.
It does not require labeled datasets where each point is labeled.
It only requires reference events defined as time points or intervals of time.
arXiv Detail & Related papers (2023-10-25T09:13:19Z) - TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks
for Medicine [91.3755431537592]
TemporAI is an open source Python software library for machine learning (ML) tasks involving data with a time component.
It supports data in time series, static, and eventmodalities and provides an interface for prediction, causal inference, and time-to-event analysis.
arXiv Detail & Related papers (2023-01-28T17:57:53Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - A Framework for Neural Topic Modeling of Text Corpora [6.340447411058068]
We introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features.
To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset.
arXiv Detail & Related papers (2021-08-19T23:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.