Characterizing and Classifying Developer Forum Posts with their Intentions
- URL: http://arxiv.org/abs/2312.14279v2
- Date: Wed, 10 Apr 2024 14:25:30 GMT
- Title: Characterizing and Classifying Developer Forum Posts with their Intentions
- Authors: Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo,
- Abstract summary: The amount of posts on online technical forums has been growing rapidly.
Most tags are only focused on the technical perspective.
The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy.
- Score: 10.452110215035072
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.
Related papers
- Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering [8.20929362102942]
Author profiling is the task of inferring characteristics about individuals by analyzing content they share.
We propose a new method for author profiling which aims at distinguishing relevant from irrelevant content first, followed by the actual user profiling only with relevant data.
We evaluate our method for Big Five personality trait prediction on two Twitter corpora.
arXiv Detail & Related papers (2024-09-06T08:43:10Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Towards Generalizable Detection of Urgency of Discussion Forum Posts [0.0]
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue.
We build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention.
This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale.
arXiv Detail & Related papers (2023-07-14T20:21:50Z) - Exploring Large Language Model for Graph Data Understanding in Online
Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs.
By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z) - Modeling Tag Prediction based on Question Tagging Behavior Analysis of
CommunityQA Platform Users [10.816557776555078]
We develop a flexible neural tag prediction architecture, which predicts both popular tags and more granular tags for each question.
Our experiments and obtained performance show the effectiveness of our model.
arXiv Detail & Related papers (2023-07-04T01:24:26Z) - Depression detection in social media posts using affective and social
norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts.
We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme.
The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z) - FETA: Towards Specializing Foundation Models for Expert Task
Applications [49.57393504125937]
Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization.
We show in this paper that FMs still have poor out-of-the-box performance on expert tasks.
We propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation.
arXiv Detail & Related papers (2022-09-08T08:47:57Z) - Urdu Speech and Text Based Sentiment Analyzer [1.4630964945453113]
This work presented a new multi-class Urdu dataset based on user evaluations.
Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative.
Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.
arXiv Detail & Related papers (2022-07-19T10:11:22Z) - Identifying Experts in Question & Answer Portals: A Case Study on Data
Science Competencies in Reddit [0.0]
We inspect the feasibility of identifying data science experts in Reddit.
Our method is based on the manual coding results where two data science experts labelled not only expert and non-expert comments, but also out-of-scope comments.
We present a semi-supervised approach which combines 1,113 labelled comments with 100,226 unlabelled comments during training.
arXiv Detail & Related papers (2022-04-08T14:30:59Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.