Twitter Topic Classification
- URL: http://arxiv.org/abs/2209.09824v1
- Date: Tue, 20 Sep 2022 16:13:52 GMT
- Title: Twitter Topic Classification
- Authors: Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo
Neves, V\'itor Silva, Francesco Barbieri
- Abstract summary: We present a new task based on tweet topic classification and release two associated datasets.
Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data.
We perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task.
- Score: 15.306383757213956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media platforms host discussions about a wide variety of topics that
arise everyday. Making sense of all the content and organising it into
categories is an arduous task. A common way to deal with this issue is relying
on topic modeling, but topics discovered using this technique are difficult to
interpret and can differ from corpus to corpus. In this paper, we present a new
task based on tweet topic classification and release two associated datasets.
Given a wide range of topics covering the most important discussion points in
social media, we provide training and testing data from recent time periods
that can be used to evaluate tweet classification models. Moreover, we perform
a quantitative evaluation and analysis of current general- and domain-specific
language models on the task, which provide more insights on the challenges and
nature of the task.
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Prompting Large Language Models for Topic Modeling [10.31712610860913]
We propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs)
It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths.
We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics.
arXiv Detail & Related papers (2023-12-15T11:15:05Z) - TopicAdapt- An Inter-Corpora Topics Adaptation Approach [27.450275637652418]
This paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus.
Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.
arXiv Detail & Related papers (2023-10-08T02:56:44Z) - Enhance Topics Analysis based on Keywords Properties [0.0]
We present a specificity score based on keywords properties that is able to select the most informative topics.
In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
arXiv Detail & Related papers (2022-03-09T15:10:12Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Topic Scaling: A Joint Document Scaling -- Topic Model Approach To Learn
Time-Specific Topics [0.0]
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions.
The first stage ranks documents using Wordfish to estimate document positions that serve as a dependent variable to learn relevant topics.
The second stage ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution.
arXiv Detail & Related papers (2021-03-31T12:35:36Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z) - Keyword Assisted Topic Models [0.0]
We show that providing a small number of keywords can substantially enhance the measurement performance of topic models.
KeyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models.
arXiv Detail & Related papers (2020-04-13T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.