Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection
- URL: http://arxiv.org/abs/2507.02137v2
- Date: Wed, 09 Jul 2025 23:22:02 GMT
- Title: Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection
- Authors: Martin Obaidi, Marc Herrmann, Jil Klünder, Kurt Schneider,
- Abstract summary: We analyze linguistic and statistical features of 10 developer communication datasets from five platforms.<n>We propose a mapping approach and questionnaire that recommends suitable sentiment analysis tools for new datasets.
- Score: 2.756862194100542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software development relies heavily on text-based communication, making sentiment analysis a valuable tool for understanding team dynamics and supporting trustworthy AI-driven analytics in requirements engineering. However, existing sentiment analysis tools often perform inconsistently across datasets from different platforms, due to variations in communication style and content. In this study, we analyze linguistic and statistical features of 10 developer communication datasets from five platforms and evaluate the performance of 14 sentiment analysis tools. Based on these results, we propose a mapping approach and questionnaire that recommends suitable sentiment analysis tools for new datasets, using their characteristic features as input. Our results show that dataset characteristics can be leveraged to improve tool selection, as platforms differ substantially in both linguistic and statistical properties. While transformer-based models such as SetFit and RoBERTa consistently achieve strong results, tool effectiveness remains context-dependent. Our approach supports researchers and practitioners in selecting trustworthy tools for sentiment analysis in software engineering, while highlighting the need for ongoing evaluation as communication contexts evolve.
Related papers
- Sentiment Analysis in Software Engineering: Evaluating Generative Pre-trained Transformers [0.0]
This study systematically evaluates the performance of bidirectional transformers, such as BERT, against generative pre-trained transformers, specifically GPT-4o-mini, in SE sentiment analysis.<n>The results reveal that fine-tuned GPT-4o-mini performs comparable to BERT and other bidirectional models on structured and balanced datasets like GitHub and Jira.<n>On linguistically complex datasets with imbalanced sentiment distributions, such as Stack Overflow, the default GPT-4o-mini model exhibits superior generalization, achieving an accuracy of 85.3% compared to the fine-tuned model's 13.1%.
arXiv Detail & Related papers (2025-04-22T14:19:25Z) - Sentiment Analysis Tools in Software Engineering: A Systematic Mapping Study [43.44042227196935]
We aim to help developers or stakeholders in their choice of sentiment analysis tools for their specific purpose.<n>Our results summarize insights from 106 papers with respect to (1) the application domain, (2) the purpose, (3) the used data sets, (4) the approaches for developing sentiment analysis tools, (5) the usage of already existing tools, and (6) the difficulties researchers face.
arXiv Detail & Related papers (2025-02-11T19:02:25Z) - On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting [2.3818760805173342]
We analyze a combination of three sentiment analysis tools in a voting classifier according to their reliability and performance.<n>The results indicate that this kind of combination of tools is a good choice in the within-platform setting.<n>However, a majority vote does not necessarily lead to better results when applying in cross-platform domains.
arXiv Detail & Related papers (2025-02-10T16:51:51Z) - ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use [51.43211624452462]
We present ToolHop, a dataset comprising 995 user queries and 3,912 associated tools.<n>ToolHop ensures diverse queries, meaningful interdependencies, locally executable tools, detailed feedback, and verifiable answers.<n>We evaluate 14 LLMs across five model families, uncovering significant challenges in handling multi-hop tool-use scenarios.
arXiv Detail & Related papers (2025-01-05T11:06:55Z) - Were You Helpful -- Predicting Helpful Votes from Amazon Reviews [0.0]
This project investigates factors that influence the perceived helpfulness of Amazon product reviews through machine learning techniques.<n>We identify key metadata characteristics that serve as strong predictors of review helpfulness.<n>This insight suggests that contextual and user-behavioral factors may be more indicative of review helpfulness than the linguistic content itself.
arXiv Detail & Related papers (2024-12-03T22:38:58Z) - You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools [74.98850427240464]
We show that sentiment analysis tools disagree on the same dataset.
We show that the sentiment tool used for sentiment annotation can even be predicted from its outcome.
arXiv Detail & Related papers (2024-10-18T17:27:38Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Efficacy of static analysis tools for software defect detection on open-source projects [0.0]
The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison.
The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection.
arXiv Detail & Related papers (2024-05-20T19:05:32Z) - Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models [11.388023221294686]
This study investigates bigger large language models (bLLMs) in addressing the labeled data shortage that hampers fine-tuned smaller large language models (sLLMs) in software engineering tasks.
We conduct a comprehensive empirical study using five established datasets to assess three open-source bLLMs in zero-shot and few-shot scenarios.
Our experimental findings demonstrate that bLLMs exhibit state-of-the-art performance on datasets marked by limited training data and imbalanced distributions.
arXiv Detail & Related papers (2023-10-17T09:53:03Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - AI Explainability 360: Impact and Design [120.95633114160688]
In 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods.
This paper examines the impact of the toolkit with several case studies, statistics, and community feedback.
The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.
arXiv Detail & Related papers (2021-09-24T19:17:09Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.