Related papers: On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting

On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting

URL: http://arxiv.org/abs/2502.06665v1
Date: Mon, 10 Feb 2025 16:51:51 GMT
Title: On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting
Authors: Martin Obaidi, Henrik Holm, Kurt Schneider, Jil Klünder,
Abstract summary: We analyze a combination of three sentiment analysis tools in a voting classifier according to their reliability and performance.<n>The results indicate that this kind of combination of tools is a good choice in the within-platform setting.<n>However, a majority vote does not necessarily lead to better results when applying in cross-platform domains.
Score: 2.3818760805173342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A positive working climate is essential in modern software development. It enhances productivity since a satisfied developer tends to deliver better results. Sentiment analysis tools are a means to analyze and classify textual communication between developers according to the polarity of the statements. Most of these tools deliver promising results when used with test data from the domain they are developed for (e.g., GitHub). But the tools' outcomes lack reliability when used in a different domain (e.g., Stack Overflow). One possible way to mitigate this problem is to combine different tools trained in different domains. In this paper, we analyze a combination of three sentiment analysis tools in a voting classifier according to their reliability and performance. The tools are trained and evaluated using five already existing polarity data sets (e.g. from GitHub). The results indicate that this kind of combination of tools is a good choice in the within-platform setting. However, a majority vote does not necessarily lead to better results when applying in cross-platform domains. In most cases, the best individual tool in the ensemble is preferable. This is mainly due to the often large difference in performance of the individual tools, even on the same data set. However, this may also be due to the different annotated data sets.

Related papers

Use as Directed? A Comparison of Software Tools Intended to Check Rigor and Transparency of Published Work [28.252424517077557]
Lack of standardization and transparency in scientific reporting is a major problem.<n>There are several automated tools that have been designed to check different rigor criteria.<n>We have conducted a broad comparison of 11 automated tools across 9 different rigor criteria from the ScreenIT group.
arXiv Detail & Related papers (2025-07-23T23:49:28Z)
Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection [2.756862194100542]
We analyze linguistic and statistical features of 10 developer communication datasets from five platforms.<n>We propose a mapping approach and questionnaire that recommends suitable sentiment analysis tools for new datasets.
arXiv Detail & Related papers (2025-07-02T20:50:25Z)
Sentiment Analysis Tools in Software Engineering: A Systematic Mapping Study [43.44042227196935]
We aim to help developers or stakeholders in their choice of sentiment analysis tools for their specific purpose. Our results summarize insights from 106 papers with respect to (1) the application domain, (2) the purpose, (3) the used data sets, (4) the approaches for developing sentiment analysis tools, (5) the usage of already existing tools, and (6) the difficulties researchers face.
arXiv Detail & Related papers (2025-02-11T19:02:25Z)
Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories [9.539825294372786]
We use two tools to extract and analyse ten large software projects.<n>Despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%.<n>We find that such substantial differences are often caused by minor technical details.
arXiv Detail & Related papers (2025-01-25T07:42:56Z)
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use [51.43211624452462]
We present ToolHop, a dataset comprising 995 user queries and 3,912 associated tools.<n>ToolHop ensures diverse queries, meaningful interdependencies, locally executable tools, detailed feedback, and verifiable answers.<n>We evaluate 14 LLMs across five model families, uncovering significant challenges in handling multi-hop tool-use scenarios.
arXiv Detail & Related papers (2025-01-05T11:06:55Z)
Meta-Reasoning Improves Tool Use in Large Language Models [10.193264105560864]
We present Tool selECTion via meta-reasONing (TECTON), a two-phase system that first reasons over a task and outputs candidate tools.<n>TECTON results in substantial gains--both in-distribution and out-of-distribution--on a range of math reasoning datasets.
arXiv Detail & Related papers (2024-11-07T08:48:33Z)
You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools [74.98850427240464]
We show that sentiment analysis tools disagree on the same dataset. We show that the sentiment tool used for sentiment annotation can even be predicted from its outcome.
arXiv Detail & Related papers (2024-10-18T17:27:38Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
Efficacy of static analysis tools for software defect detection on open-source projects [0.0]
The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison. The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection.
arXiv Detail & Related papers (2024-05-20T19:05:32Z)
What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks. We provide a unified definition of tools as external programs used by LMs. We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z)
TOOLVERIFIER: Generalization to New Tools via Self-Verification [69.85190990517184]
We introduce a self-verification method which distinguishes between close candidates by self-asking contrastive questions during tool selection. Experiments on 4 tasks from the ToolBench benchmark, consisting of 17 unseen tools, demonstrate an average improvement of 22% over few-shot baselines.
arXiv Detail & Related papers (2024-02-21T22:41:38Z)
ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks. Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools. We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z)
Open Tracing Tools: Overview and Critical Comparison [10.196089289625599]
This paper aims to provide an overview of popular Open tracing tools via comparison. We first identified ra30 tools in an objective, systematic, and reproducible manner. We then characterized each tool looking at the 1) measured features, 2) popularity both in peer-reviewed literature and online media, and 3) benefits and issues.
arXiv Detail & Related papers (2022-07-14T12:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.