An NLP approach to quantify dynamic salience of predefined topics in a
text corpus
- URL: http://arxiv.org/abs/2108.07345v1
- Date: Mon, 16 Aug 2021 21:00:06 GMT
- Title: An NLP approach to quantify dynamic salience of predefined topics in a
text corpus
- Authors: A. Bock, A. Palladino, S. Smith-Heisters, I. Boardman, E. Pellegrini,
E.J. Bienenstock, A. Valenti
- Abstract summary: We use natural language processing techniques to quantify how a set of pre-defined topics of interest change over time across a large corpus of text.
We find that given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of news media available online simultaneously presents a
valuable resource and significant challenge to analysts aiming to profile and
understand social and cultural trends in a geographic location of interest.
While an abundance of news reports documenting significant events, trends, and
responses provides a more democratized picture of the social characteristics of
a location, making sense of an entire corpus to extract significant trends is a
steep challenge for any one analyst or team. Here, we present an approach using
natural language processing techniques that seeks to quantify how a set of
pre-defined topics of interest change over time across a large corpus of text.
We found that, given a predefined topic, we can identify and rank sets of
terms, or n-grams, that map to those topics and have usage patterns that
deviate from a normal baseline. Emergence, disappearance, or significant
variations in n-gram usage present a ground-up picture of a topic's dynamic
salience within a corpus of interest.
Related papers
- Combining Objective and Subjective Perspectives for Political News Understanding [5.741243797283764]
We introduce a text analysis framework which integrates both perspectives and provides a fine-grained processing of subjective aspects.
We illustrate its functioning with insights on news outlets, political orientations, topics, individual entities, and demographic segments.
arXiv Detail & Related papers (2024-08-20T20:13:19Z) - Decoding Multilingual Topic Dynamics and Trend Identification through ARIMA Time Series Analysis on Social Networks: A Novel Data Translation Framework Enhanced by LDA/HDP Models [0.08246494848934444]
We focus on dialogues within Tunisian social networks during the Coronavirus Pandemic and other notable themes like sports and politics.
We start by aggregating a varied multilingual corpus of comments relevant to these subjects.
We then introduce our No-English-to-English Machine Translation approach to handle linguistic differences.
arXiv Detail & Related papers (2024-03-18T00:01:10Z) - Time Series Analysis of Key Societal Events as Reflected in Complex
Social Media Data Streams [0.9790236766474201]
This study investigates narrative evolution on a niche social media platform GAB and an established messaging service Telegram.
Our approach is a novel mode to study multiple social media domains to distil key information which may be obscured otherwise.
The main findings are: (1) the time line can be deconstructed to provide useful data features allowing for improved interpretation; (2) a methodology is applied which provides a basis for generalization.
arXiv Detail & Related papers (2024-03-11T18:33:56Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Twitter Topic Classification [15.306383757213956]
We present a new task based on tweet topic classification and release two associated datasets.
Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data.
We perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task.
arXiv Detail & Related papers (2022-09-20T16:13:52Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion
Mining [0.0]
We propose a method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously.
We apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products.
We obtain a high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective contents, and an interpretable representation for social media documents.
arXiv Detail & Related papers (2021-11-03T14:49:50Z) - Compression, Transduction, and Creation: A Unified Framework for
Evaluating Natural Language Generation [85.32991360774447]
Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives.
We propose a unifying perspective based on the nature of information change in NLG tasks.
We develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks.
arXiv Detail & Related papers (2021-09-14T01:00:42Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.