Related papers: On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers

On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers

URL: http://arxiv.org/abs/2308.12095v2
Date: Fri, 25 Aug 2023 08:05:52 GMT
Title: On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers
Authors: Laura Cabra-Acela and Anamaria Mojica-Hanke and Mario Linares-V\'asquez and Steffen Herbold
Abstract summary: Not embracing good machine learning practices may hinder the performance of an ML system. Many non-ML experts turn towards gray literature like blogs and Q&A systems when looking for help and guidance. We propose a recommender system that recommends ML practices based on the user's context.
Score: 6.7659763626415135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) is nowadays widely used for different purposes and in several disciplines. From self-driving cars to automated medical diagnosis, machine learning models extensively support users' daily activities, and software engineering tasks are no exception. Not embracing good ML practices may lead to pitfalls that hinder the performance of an ML system and potentially lead to unexpected results. Despite the existence of documentation and literature about ML best practices, many non-ML experts turn towards gray literature like blogs and Q&A systems when looking for help and guidance when implementing ML systems. To better aid users in distilling relevant knowledge from such sources, we propose a recommender system that recommends ML practices based on the user's context. As a first step in creating a recommender system for machine learning practices, we implemented Idaka. A tool that provides two different approaches for retrieving/generating ML best practices: i) an information retrieval (IR) engine and ii) a large language model. The IR-engine uses BM25 as the algorithm for retrieving the practices, and a large language model, in our case Alpaca. The platform has been designed to allow comparative studies of best practices retrieval tools. Idaka is publicly available at GitHub: https://bit.ly/idaka. Video: https://youtu.be/cEb-AhIPxnM.

Related papers

A Large-Scale Study of Model Integration in ML-Enabled Software Systems [4.776073133338119]
Machine learning (ML) and its embedding in systems has drastically changed the engineering of software-intensive systems. Traditionally, software engineering focuses on manually created artifacts such as source code and the process of creating them. We present the first large-scale study of real ML-enabled software systems, covering over 2,928 open source systems on GitHub.
arXiv Detail & Related papers (2024-08-12T15:28:40Z)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks. This paper presents the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities. We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z)
What are the Machine Learning best practices reported by practitioners on Stack Exchange? [4.882319198853359]
We present a study listing 127 Machine Learning best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites. The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system.
arXiv Detail & Related papers (2023-01-25T10:50:28Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
OmniXAI: A Library for Explainable AI [98.07381528393245]
We introduce OmniXAI, an open-source Python library of eXplainable AI (XAI) It offers omni-way explainable AI capabilities and various interpretable machine learning techniques. For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications.
arXiv Detail & Related papers (2022-06-01T11:35:37Z)
Declarative Machine Learning Systems [7.5717114708721045]
Machine learning (ML) has moved from a academic endeavor to a pervasive technology adopted in almost every aspect of computing. Recent successes in applying ML in natural sciences revealed that ML can be used to tackle some of the hardest real-world problems humanity faces today. We believe the next wave of ML systems will allow a larger amount of people, potentially without coding skills, to perform the same tasks.
arXiv Detail & Related papers (2021-07-16T23:57:57Z)
Enabling Un-/Semi-Supervised Machine Learning for MDSE of the Real-World CPS/IoT Applications [0.5156484100374059]
We propose a novel approach to support domain-specific Model-Driven Software Engineering (MDSE) for the real-world use-case scenarios of smart Cyber-Physical Systems (CPS) and the Internet of Things (IoT) We argue that the majority of available data in the nature for Artificial Intelligence (AI) are unlabeled. Hence, unsupervised and/or semi-supervised ML approaches are the practical choices. Our proposed approach is fully implemented and integrated with an existing state-of-the-art MDSE tool to serve the CPS/IoT domain.
arXiv Detail & Related papers (2021-07-06T15:51:39Z)
White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z)
A Neophyte With AutoML: Evaluating the Promises of Automatic Machine Learning Tools [1.713291434132985]
This paper discusses modern Auto Machine Learning (AutoML) tools from the perspective of a person with little prior experience in Machine Learning (ML) There are many AutoML tools both ready-to-use and under development, which are created to simplify and democratize usage of ML technologies in everyday life.
arXiv Detail & Related papers (2021-01-14T19:28:57Z)
Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis. This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z)
An Information-Theoretic Approach to Personalized Explainable Machine Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge. We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.