How Much Data Analytics is Enough? The ROI of Machine Learning
Classification and its Application to Requirements Dependency Classification
- URL: http://arxiv.org/abs/2109.14097v1
- Date: Tue, 28 Sep 2021 23:27:57 GMT
- Title: How Much Data Analytics is Enough? The ROI of Machine Learning
Classification and its Application to Requirements Dependency Classification
- Authors: Gouri Deshpande, Guenther Ruhe, Chad Saunders
- Abstract summary: Machine Learning can substantially improve the efficiency and effectiveness of organizations.
However, the selection and implementation of ML techniques rely almost exclusively on accuracy criteria.
We present findings for an approach that addresses this gap by enhancing the accuracy criterion with return on investment considerations.
- Score: 5.195942130196466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) can substantially improve the efficiency and
effectiveness of organizations and is widely used for different purposes within
Software Engineering. However, the selection and implementation of ML
techniques rely almost exclusively on accuracy criteria. Thus, for
organizations wishing to realize the benefits of ML investments, this narrow
approach ignores crucial considerations around the anticipated costs of the ML
activities across the ML lifecycle, while failing to account for the benefits
that are likely to accrue from the proposed activity. We present findings for
an approach that addresses this gap by enhancing the accuracy criterion with
return on investment (ROI) considerations. Specifically, we analyze the
performance of the two state-of-the-art ML techniques: Random Forest and
Bidirectional Encoder Representations from Transformers (BERT), based on
accuracy and ROI for two publicly available data sets. Specifically, we compare
decision-making on requirements dependency extraction (i) exclusively based on
accuracy and (ii) extended to include ROI analysis. As a result, we propose
recommendations for selecting ML classification techniques based on the degree
of training data used. Our findings indicate that considering ROI as additional
criteria can drastically influence ML selection when compared to decisions
based on accuracy as the sole criterion
Related papers
- EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - AROhI: An Interactive Tool for Estimating ROI of Data Analytics [0.0]
It is crucial to consider Return On Investment when performing data analytics.
This work details a comprehensive tool that provides conventional and advanced ML approaches for demonstration.
arXiv Detail & Related papers (2024-07-18T18:19:17Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - The Economic Implications of Large Language Model Selection on Earnings and Return on Investment: A Decision Theoretic Model [0.0]
We use a decision-theoretic approach to compare the financial impact of different language models.
The study reveals how the superior accuracy of more expensive models can, under certain conditions, justify a greater investment.
This article provides a framework for companies looking to optimize their technology choices.
arXiv Detail & Related papers (2024-05-27T20:08:41Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - On Taking Advantage of Opportunistic Meta-knowledge to Reduce
Configuration Spaces for Automated Machine Learning [11.670797168818773]
Key research question is whether it is possible and practical to preemptively avoid costly evaluations of poorly performing ML pipelines.
Numerous experiments with the AutoWeka4MCPS package suggest that opportunistic/systematic meta-knowledge can improve ML outcomes.
We observe strong sensitivity to the challenge' of a dataset, i.e. whether specificity in choosing a predictor leads to significantly better performance.
arXiv Detail & Related papers (2022-08-08T19:22:24Z) - Filter Methods for Feature Selection in Supervised Machine Learning
Applications -- Review and Benchmark [0.0]
This review synthesizes the literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment.
We consider four typical dataset scenarios that are challenging for ML models.
arXiv Detail & Related papers (2021-11-23T20:20:24Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)
We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.