Making Machine Learning Datasets and Models FAIR for HPC: A Methodology
and Case Study
- URL: http://arxiv.org/abs/2211.02092v1
- Date: Thu, 3 Nov 2022 18:45:46 GMT
- Title: Making Machine Learning Datasets and Models FAIR for HPC: A Methodology
and Case Study
- Authors: Pei-Hung Lin, Chunhua Liao, Winson Chen, Tristan Vanderbruggen, Murali
Emani, Hailu Xu
- Abstract summary: The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable.
These principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing.
We design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The FAIR Guiding Principles aim to improve the findability, accessibility,
interoperability, and reusability of digital content by making them both human
and machine actionable. However, these principles have not yet been broadly
adopted in the domain of machine learning-based program analyses and
optimizations for High-Performance Computing (HPC). In this paper, we design a
methodology to make HPC datasets and machine learning models FAIR after
investigating existing FAIRness assessment and improvement techniques. Our
methodology includes a comprehensive, quantitative assessment for elected data,
followed by concrete, actionable suggestions to improve FAIRness with respect
to common issues related to persistent identifiers, rich metadata descriptions,
license and provenance information. Moreover, we select a representative
training dataset to evaluate our methodology. The experiment shows the
methodology can effectively improve the dataset and model's FAIRness from an
initial score of 19.1% to the final score of 83.0%.
Related papers
- User-centric evaluation of explainability of AI with and for humans: a comprehensive empirical study [5.775094401949666]
This study is located in the Human-Centered Artificial Intelligence (HCAI)
It focuses on the results of a user-centered assessment of commonly used eXplainable Artificial Intelligence (XAI) algorithms.
arXiv Detail & Related papers (2024-10-21T12:32:39Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - An explainable machine learning-based approach for analyzing customers'
online data to identify the importance of product attributes [0.6437284704257459]
We propose a game theory machine learning (ML) method that extracts comprehensive design implications for product development.
We apply our method to a real-world dataset of laptops from Kaggle, and derive design implications based on the results.
arXiv Detail & Related papers (2024-02-03T20:50:48Z) - GPT in Data Science: A Practical Exploration of Model Selection [0.7646713951724013]
This research is committed to advancing our comprehension of AI decision-making processes.
Our efforts are directed towards creating AI systems that are more transparent and comprehensible.
arXiv Detail & Related papers (2023-11-20T03:42:24Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z) - Efficient Real-world Testing of Causal Decision Making via Bayesian
Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making.
Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.