Stay on Topic, Please: Aligning User Comments to the Content of a News
  Article
        - URL: http://arxiv.org/abs/2103.06130v1
- Date: Wed, 3 Mar 2021 18:29:00 GMT
- Title: Stay on Topic, Please: Aligning User Comments to the Content of a News
  Article
- Authors: Jumanah Alshehri, Marija Stanojevic, Eduard Dragut, Zoran Obradovic
- Abstract summary: We propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content.
The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic.
We conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task.
- Score: 7.3203631241415055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Social scientists have shown that up to 50% if the content posted to a news
article have no relation to its journalistic content. In this study we propose
a classification algorithm to categorize user comments posted to a new article
base don their alignment to its content. The alignment seek to match user
comments to an article based on similarity off content, entities in discussion,
and topic. We proposed a BERTAC, BAERT-based approach that learn jointly
article-comment embeddings and infers the relevance class of comments. We
introduce an ordinal classification loss that penalizes the difference between
the predicted and true label. We conduct a thorough study to show influence of
the proposed loss on the learning process. The results on five representative
news outlets show that our approach can learn the comment class with up to 36%
average accuracy improvement compering to the baselines, and up to 25%
compering to the BA-BC model. BA-BC is out approach that consists of two models
aimed to capture dis-jointly the formal language of news articles and the
informal language of comments. We also conduct a user study to evaluate human
labeling performance to understand the difficulty of the classification task.
The user agreement on comment-article alignment is "moderate" per
Krippendorff's alpha score, which suggests that the classification task is
difficult.
 
      
        Related papers
        - LLM-based Rewriting of Inappropriate Argumentation using Reinforcement   Learning from Machine Feedback [16.57980268646285]
 This paper studies how inappropriate language in arguments can be computationally mitigated.
We propose a reinforcement learning-based rewriting approach that balances content preservation and appropriateness.
We evaluate different weighting schemes for the reward function in both absolute and relative human assessment studies.
 arXiv  Detail & Related papers  (2024-06-05T15:18:08Z)
- Explore Spurious Correlations at the Concept Level in Language Models   for Text Classification [28.832684088975622]
 Language models (LMs) have achieved notable success in numerous NLP tasks.
They face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars.
This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data.
Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations.
 arXiv  Detail & Related papers  (2023-11-15T01:58:54Z)
- JointMatch: A Unified Approach for Diverse and Collaborative
  Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
 Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
 arXiv  Detail & Related papers  (2023-10-23T05:43:35Z)
- Subjective Crowd Disagreements for Subjective Data: Uncovering
  Meaningful CrowdOpinion with Population-level Learning [8.530934084017966]
 We introduce emphCrowdOpinion, an unsupervised learning approach that uses language features and label distributions to pool similar items into larger samples of label distributions.
We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media.
We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts.
 arXiv  Detail & Related papers  (2023-07-07T22:09:46Z)
- Like a Good Nearest Neighbor: Practical Content Moderation and Text
  Classification [66.02091763340094]
 Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
 arXiv  Detail & Related papers  (2023-02-17T15:43:29Z)
- Distant finetuning with discourse relations for stance classification [55.131676584455306]
 We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
 arXiv  Detail & Related papers  (2022-04-27T04:24:35Z)
- Hierarchical Bi-Directional Self-Attention Networks for Paper Review
  Rating Recommendation [81.55533657694016]
 We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
 Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
 arXiv  Detail & Related papers  (2020-11-02T08:07:50Z)
- Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
  Aspect-Sentiment Topic Embedding [71.2260967797055]
 We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
 arXiv  Detail & Related papers  (2020-10-13T21:33:24Z)
- A Unified Dual-view Model for Review Summarization and Sentiment
  Classification with Inconsistency Loss [51.448615489097236]
 Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms.
We propose a novel dual-view model that jointly improves the performance of these two tasks.
Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
 arXiv  Detail & Related papers  (2020-06-02T13:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.