Related papers: What are the Machine Learning best practices reported by practitioners on Stack Exchange?

What are the Machine Learning best practices reported by practitioners on Stack Exchange?

URL: http://arxiv.org/abs/2301.10516v1
Date: Wed, 25 Jan 2023 10:50:28 GMT
Title: What are the Machine Learning best practices reported by practitioners on Stack Exchange?
Authors: Anamaria Mojica-Hanke and Andrea Bayona and Mario Linares-V\'asquez and Steffen Herbold and Fabio A. Gonz\'alez
Abstract summary: We present a study listing 127 Machine Learning best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites. The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system.
Score: 4.882319198853359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Learning (ML) is being used in multiple disciplines due to its powerful capability to infer relationships within data. In particular, Software Engineering (SE) is one of those disciplines in which ML has been used for multiple tasks, like software categorization, bugs prediction, and testing. In addition to the multiple ML applications, some studies have been conducted to detect and understand possible pitfalls and issues when using ML. However, to the best of our knowledge, only a few studies have focused on presenting ML best practices or guidelines for the application of ML in different domains. In addition, the practices and literature presented in previous literature (i) are domain-specific (e.g., concrete practices in biomechanics), (ii) describe few practices, or (iii) the practices lack rigorous validation and are presented in gray literature. In this paper, we present a study listing 127 ML best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites and validated by four independent ML experts. The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system; for each practice, we include explanations and examples. In all the practices, the provided examples focus on SE tasks. We expect this list of practices could help practitioners to understand better the practices and use ML in a more informed way, in particular newcomers to this new area that sits at the intersection of software engineering and machine learning.

Related papers

Contextual Fairness-Aware Practices in ML: A Cost-Effective Empirical Evaluation [48.943054662940916]
We investigate fairness-aware practices from two perspectives: contextual and cost-effectiveness. Our findings provide insights into how context influences the effectiveness of fairness-aware practices. This research aims to guide SE practitioners in selecting practices that achieve fairness with minimal performance costs.
arXiv Detail & Related papers (2025-03-19T18:10:21Z)
Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education [12.716955305620191]
This study aims to contribute to the knowledge, about the synergy between Machine Learning (ML) and Software Engineering (SE) We analyzed SE researchers familiar with ML or who authored SE articles using ML, along with the articles themselves. We found diverse practices focusing on data collection, model training, and evaluation.
arXiv Detail & Related papers (2024-11-28T18:21:24Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers [6.7659763626415135]
Not embracing good machine learning practices may hinder the performance of an ML system. Many non-ML experts turn towards gray literature like blogs and Q&A systems when looking for help and guidance. We propose a recommender system that recommends ML practices based on the user's context.
arXiv Detail & Related papers (2023-08-23T12:28:18Z)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks. This paper presents the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
Towards machine learning guided by best practices [0.0]
Machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE) This thesis aims to answer research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community.
arXiv Detail & Related papers (2023-04-29T10:58:37Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning. We propose a simple but effective in-context learning framework called ICL-D3IE. Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)
Democratizing Machine Learning for Interdisciplinary Scholars: Report on Organizing the NLP+CSS Online Tutorial Series [0.9645196221785691]
Existing tutorials are often costly to participants, presume extensive programming knowledge, and are not tailored to specific application fields. We organized a year-long, free, online tutorial series targeted at teaching advanced natural language processing (NLP) methods to computational social science (CSS) scholars. Although live participation was more limited than expected, a comparison of pre- and post-tutorial surveys showed an increase in participants' perceived knowledge of almost one point on a 7-point Likert scale.
arXiv Detail & Related papers (2022-11-29T07:06:45Z)
Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities. We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies. The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z)
Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms. It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z)
"Garbage In, Garbage Out" Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data? [0.0]
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent 'best practices' around labeling training data were followed in applied ML publications.
arXiv Detail & Related papers (2021-07-05T21:24:02Z)
White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.