On Using Information Retrieval to Recommend Machine Learning Good
Practices for Software Engineers
- URL: http://arxiv.org/abs/2308.12095v2
- Date: Fri, 25 Aug 2023 08:05:52 GMT
- Title: On Using Information Retrieval to Recommend Machine Learning Good
Practices for Software Engineers
- Authors: Laura Cabra-Acela and Anamaria Mojica-Hanke and Mario
Linares-V\'asquez and Steffen Herbold
- Abstract summary: Not embracing good machine learning practices may hinder the performance of an ML system.
Many non-ML experts turn towards gray literature like blogs and Q&A systems when looking for help and guidance.
We propose a recommender system that recommends ML practices based on the user's context.
- Score: 6.7659763626415135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) is nowadays widely used for different purposes and in
several disciplines. From self-driving cars to automated medical diagnosis,
machine learning models extensively support users' daily activities, and
software engineering tasks are no exception. Not embracing good ML practices
may lead to pitfalls that hinder the performance of an ML system and
potentially lead to unexpected results. Despite the existence of documentation
and literature about ML best practices, many non-ML experts turn towards gray
literature like blogs and Q&A systems when looking for help and guidance when
implementing ML systems. To better aid users in distilling relevant knowledge
from such sources, we propose a recommender system that recommends ML practices
based on the user's context. As a first step in creating a recommender system
for machine learning practices, we implemented Idaka. A tool that provides two
different approaches for retrieving/generating ML best practices: i) an
information retrieval (IR) engine and ii) a large language model. The IR-engine
uses BM25 as the algorithm for retrieving the practices, and a large language
model, in our case Alpaca. The platform has been designed to allow comparative
studies of best practices retrieval tools. Idaka is publicly available at
GitHub: https://bit.ly/idaka. Video: https://youtu.be/cEb-AhIPxnM.
Related papers
- A Large-Scale Study of Model Integration in ML-Enabled Software Systems [4.776073133338119]
Machine learning (ML) and its embedding in systems has drastically changed the engineering of software-intensive systems.
Traditionally, software engineering focuses on manually created artifacts such as source code and the process of creating them.
We present the first large-scale study of real ML-enabled software systems, covering over 2,928 open source systems on GitHub.
arXiv Detail & Related papers (2024-08-12T15:28:40Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - What are the Machine Learning best practices reported by practitioners
on Stack Exchange? [4.882319198853359]
We present a study listing 127 Machine Learning best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites.
The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system.
arXiv Detail & Related papers (2023-01-25T10:50:28Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - OmniXAI: A Library for Explainable AI [98.07381528393245]
We introduce OmniXAI, an open-source Python library of eXplainable AI (XAI)
It offers omni-way explainable AI capabilities and various interpretable machine learning techniques.
For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications.
arXiv Detail & Related papers (2022-06-01T11:35:37Z) - Declarative Machine Learning Systems [7.5717114708721045]
Machine learning (ML) has moved from a academic endeavor to a pervasive technology adopted in almost every aspect of computing.
Recent successes in applying ML in natural sciences revealed that ML can be used to tackle some of the hardest real-world problems humanity faces today.
We believe the next wave of ML systems will allow a larger amount of people, potentially without coding skills, to perform the same tasks.
arXiv Detail & Related papers (2021-07-16T23:57:57Z) - White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z) - A Neophyte With AutoML: Evaluating the Promises of Automatic Machine
Learning Tools [1.713291434132985]
This paper discusses modern Auto Machine Learning (AutoML) tools from the perspective of a person with little prior experience in Machine Learning (ML)
There are many AutoML tools both ready-to-use and under development, which are created to simplify and democratize usage of ML technologies in everyday life.
arXiv Detail & Related papers (2021-01-14T19:28:57Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z) - An Information-Theoretic Approach to Personalized Explainable Machine
Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge.
We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.