What are the Machine Learning best practices reported by practitioners
on Stack Exchange?
- URL: http://arxiv.org/abs/2301.10516v1
- Date: Wed, 25 Jan 2023 10:50:28 GMT
- Title: What are the Machine Learning best practices reported by practitioners
on Stack Exchange?
- Authors: Anamaria Mojica-Hanke and Andrea Bayona and Mario Linares-V\'asquez
and Steffen Herbold and Fabio A. Gonz\'alez
- Abstract summary: We present a study listing 127 Machine Learning best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites.
The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system.
- Score: 4.882319198853359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) is being used in multiple disciplines due to its
powerful capability to infer relationships within data. In particular, Software
Engineering (SE) is one of those disciplines in which ML has been used for
multiple tasks, like software categorization, bugs prediction, and testing. In
addition to the multiple ML applications, some studies have been conducted to
detect and understand possible pitfalls and issues when using ML. However, to
the best of our knowledge, only a few studies have focused on presenting ML
best practices or guidelines for the application of ML in different domains. In
addition, the practices and literature presented in previous literature (i) are
domain-specific (e.g., concrete practices in biomechanics), (ii) describe few
practices, or (iii) the practices lack rigorous validation and are presented in
gray literature. In this paper, we present a study listing 127 ML best
practices systematically mining 242 posts of 14 different Stack Exchange (STE)
websites and validated by four independent ML experts. The list of practices is
presented in a set of categories related to different stages of the
implementation process of an ML-enabled system; for each practice, we include
explanations and examples. In all the practices, the provided examples focus on
SE tasks. We expect this list of practices could help practitioners to
understand better the practices and use ML in a more informed way, in
particular newcomers to this new area that sits at the intersection of software
engineering and machine learning.
Related papers
- Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - On Using Information Retrieval to Recommend Machine Learning Good
Practices for Software Engineers [6.7659763626415135]
Not embracing good machine learning practices may hinder the performance of an ML system.
Many non-ML experts turn towards gray literature like blogs and Q&A systems when looking for help and guidance.
We propose a recommender system that recommends ML practices based on the user's context.
arXiv Detail & Related papers (2023-08-23T12:28:18Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - Towards machine learning guided by best practices [0.0]
Machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE)
This thesis aims to answer research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community.
arXiv Detail & Related papers (2023-04-29T10:58:37Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Democratizing Machine Learning for Interdisciplinary Scholars: Report on
Organizing the NLP+CSS Online Tutorial Series [0.9645196221785691]
Existing tutorials are often costly to participants, presume extensive programming knowledge, and are not tailored to specific application fields.
We organized a year-long, free, online tutorial series targeted at teaching advanced natural language processing (NLP) methods to computational social science (CSS) scholars.
Although live participation was more limited than expected, a comparison of pre- and post-tutorial surveys showed an increase in participants' perceived knowledge of almost one point on a 7-point Likert scale.
arXiv Detail & Related papers (2022-11-29T07:06:45Z) - Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities.
We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies.
The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z) - Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms.
It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z) - "Garbage In, Garbage Out" Revisited: What Do Machine Learning
Application Papers Report About Human-Labeled Training Data? [0.0]
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data.
This study builds on prior work that investigated to what extent 'best practices' around labeling training data were followed in applied ML publications.
arXiv Detail & Related papers (2021-07-05T21:24:02Z) - White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.