Maturity Framework for Enhancing Machine Learning Quality
- URL: http://arxiv.org/abs/2502.15758v1
- Date: Wed, 12 Feb 2025 14:56:46 GMT
- Title: Maturity Framework for Enhancing Machine Learning Quality
- Authors: Angelantonio Castelli, Georgios Christos Chouliaras, Dmitri Goldenberg,
- Abstract summary: We suggest a methodical approach towards Machine Learning system quality assessment and introduce a structured Maturity framework for governance of ML.<n>We emphasize the importance of quality in ML and the need for rigorous assessment, driven by issues in ML governance and gaps in existing frameworks.<n>The study presents empirical findings, highlighting quality improvement trends and showcasing business outcomes.
- Score: 4.840252889901222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid integration of Machine Learning (ML) in business applications and processes, it is crucial to ensure the quality, reliability and reproducibility of such systems. We suggest a methodical approach towards ML system quality assessment and introduce a structured Maturity framework for governance of ML. We emphasize the importance of quality in ML and the need for rigorous assessment, driven by issues in ML governance and gaps in existing frameworks. Our primary contribution is a comprehensive open-sourced quality assessment method, validated with empirical evidence, accompanied by a systematic maturity framework tailored to ML systems. Drawing from applied experience at Booking.com, we discuss challenges and lessons learned during large-scale adoption within organizations. The study presents empirical findings, highlighting quality improvement trends and showcasing business outcomes. The maturity framework for ML systems, aims to become a valuable resource to reshape industry standards and enable a structural approach to improve ML maturity in any organization.
Related papers
- Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns.
Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance.
We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z) - Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML)<n>We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z) - Addressing Quality Challenges in Deep Learning: The Role of MLOps and Domain Knowledge [5.190998244098203]
Deep learning (DL) systems present unique challenges in software engineering, especially concerning quality attributes like correctness and resource efficiency.<n>This experience paper explores the role of MLOps practices in creating transparent and reproducible experimentation environments.<n>We report on experiences addressing the quality challenges by embedding domain knowledge into the design of a DL model and its integration within a larger system.
arXiv Detail & Related papers (2025-01-14T19:37:08Z) - Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs)
We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z) - Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives [54.14429346914995]
Chain-of-Thought (CoT) has become a pivotal method for solving complex problems.
Large language models (LLMs) often struggle to accurately decompose domain-specific tasks.
This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from the perspectives of capability, skill, and knowledge.
arXiv Detail & Related papers (2024-08-13T13:58:23Z) - How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency [60.25969380388974]
Large Language Models (LLMs) are increasingly explored as knowledge bases (KBs)<n>Current evaluation methods focus too narrowly on knowledge retention, overlooking other crucial criteria for reliable performance.<n>We propose new criteria and metrics to quantify factuality and consistency, leading to a final reliability score.
arXiv Detail & Related papers (2024-07-18T15:20:18Z) - LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM [83.98966702271576]
This study aims to investigate the feasibility of imparting Point Cloud Quality Assessment (PCQA) knowledge to large multi-modality models (LMMs)
We transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds.
Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA.
arXiv Detail & Related papers (2024-04-28T14:47:09Z) - Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems [3.0309252269809264]
This paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences.
We develop an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs)
The paper presents a case study to demonstrate the feasibility of the developed IML framework for model predictive control-based precooling under demand response events in a virtual testbed.
arXiv Detail & Related papers (2024-02-14T21:19:33Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Towards Guidelines for Assessing Qualities of Machine Learning Systems [1.715032913622871]
This article presents the construction of a quality model for an ML system based on an industrial use case.
In the future, we want to learn how the term quality differs between different types of ML systems.
arXiv Detail & Related papers (2020-08-25T13:45:54Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.