Text2Time: Transformer-based Article Time Period Prediction
- URL: http://arxiv.org/abs/2304.10859v2
- Date: Mon, 24 Apr 2023 03:56:03 GMT
- Title: Text2Time: Transformer-based Article Time Period Prediction
- Authors: Karthick Prasad Gunasekaran, B Chase Babrich, Saurabh Shirodkar, Hee
Hwang
- Abstract summary: This work investigates the problem of predicting the publication period of a text document, specifically a news article, based on its textual content.
We create our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades.
In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.
- Score: 0.11470070927586018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of predicting the publication period of text documents, such as news
articles, is an important but less studied problem in the field of natural
language processing. Predicting the year of a news article can be useful in
various contexts, such as historical research, sentiment analysis, and media
monitoring. In this work, we investigate the problem of predicting the
publication period of a text document, specifically a news article, based on
its textual content. In order to do so, we created our own extensive labeled
dataset of over 350,000 news articles published by The New York Times over six
decades. In our approach, we use a pretrained BERT model fine-tuned for the
task of text classification, specifically for time period prediction.This model
exceeds our expectations and provides some very impressive results in terms of
accurately classifying news articles into their respective publication decades.
The results beat the performance of the baseline model for this relatively
unexplored task of time prediction from text.
Related papers
- On the Role of Context in Reading Time Prediction [50.87306355705826]
We present a new perspective on how readers integrate context during real-time language comprehension.
Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit is an affine function of its in-context information content.
arXiv Detail & Related papers (2024-09-12T15:52:22Z) - AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval [9.357912396498142]
We introduce AutoCast++, a zero-shot ranking-based context retrieval system.
Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news.
We conduct both the relevance evaluation and article summarization without needing domain-specific training.
arXiv Detail & Related papers (2023-10-03T08:34:44Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Studying the impacts of pre-training using ChatGPT-generated text on
downstream tasks [0.0]
Our research aims to investigate the influence of artificial text in the pre-training phase of language models.
We conducted a comparative analysis between a language model, RoBERTa, pre-trained using CNN/DailyMail news articles, and ChatGPT, which employed the same articles for its training.
We demonstrate that the utilization of artificial text during pre-training does not have a significant impact on either the performance of the models in downstream tasks or their gender bias.
arXiv Detail & Related papers (2023-09-02T12:56:15Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - How News Evolves? Modeling News Text and Coverage using Graphs and
Hawkes Process [3.655021726150368]
We present a method of converting news text collected over time to a sequence of directed multi-graphs, which represent semantic triples.
We model the dynamics of specific topological changes from these graphs using discrete-time Hawkes processes.
With our real-world data, we show that analyzing the structures of the graphs and the discrete-time Hawkes process model can yield insights on how the news events were covered and how to predict how it may be covered in the future.
arXiv Detail & Related papers (2021-11-18T10:36:40Z) - No News is Good News: A Critique of the One Billion Word Benchmark [4.396860522241306]
The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl.
We train models solely on Common Crawl web scrapes partitioned by year, and demonstrate that they perform worse on this task over time due to distributional shift.
arXiv Detail & Related papers (2021-10-25T02:41:27Z) - Subsentence Extraction from Text Using Coverage-Based Deep Learning
Language Models [3.3461339691835277]
We propose a coverage-based sentiment and subsentence extraction system.
The predicted subsentence consists of auxiliary information expressing a sentiment.
Our approach outperforms the state-of-the-art approaches by a large margin in subsentence prediction.
arXiv Detail & Related papers (2021-04-20T06:24:49Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system.
It generates a ranked list of quotable paragraphs and spans of tokens from a given source document.
We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.