Improving Radiography Machine Learning Workflows via Metadata Management for Training Data Selection
- URL: http://arxiv.org/abs/2408.12655v1
- Date: Thu, 22 Aug 2024 18:01:21 GMT
- Title: Improving Radiography Machine Learning Workflows via Metadata Management for Training Data Selection
- Authors: Mirabel Reid, Christine Sweeney, Oleg Korobkin,
- Abstract summary: In the physical sciences, there is an ever-increasing pool of metadata that is generated by the scientific research cycle.
Tracking this metadata can reduce redundant work, improve, and aid in the feature and training dataset engineering process.
We present a tool for machine learning metadata management in dynamic radiography.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most machine learning models require many iterations of hyper-parameter tuning, feature engineering, and debugging to produce effective results. As machine learning models become more complicated, this pipeline becomes more difficult to manage effectively. In the physical sciences, there is an ever-increasing pool of metadata that is generated by the scientific research cycle. Tracking this metadata can reduce redundant work, improve reproducibility, and aid in the feature and training dataset engineering process. In this case study, we present a tool for machine learning metadata management in dynamic radiography. We evaluate the efficacy of this tool against the initial research workflow and discuss extensions to general machine learning pipelines in the physical sciences.
Related papers
- Metadata practices for simulation workflows [0.0]
We present general practices for acquiring and handling metadata that are agnostic to software and hardware.
These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata.
As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance use cases from neuroscience and hydrology.
arXiv Detail & Related papers (2024-08-30T14:12:31Z) - Obtaining physical layer data of latest generation networks for investigating adversary attacks [0.0]
Machine learning can be used to optimize the functions of latest generation data networks such as 5G and 6G.
adversarial measures that manipulate the behaviour of intelligent machine learning models are becoming a major concern.
A simulation model is proposed that works in conjunction with machine learning applications.
arXiv Detail & Related papers (2024-05-02T06:03:27Z) - Code Generation for Machine Learning using Model-Driven Engineering and
SysML [0.0]
This work aims to facilitate the implementation of data-driven engineering in practice by extending the previous work of formalizing machine learning tasks.
The presented method is evaluated for feasibility in a case study to predict weather forecasts.
Results demonstrate the flexibility and the simplicity of the method reducing efforts for implementation.
arXiv Detail & Related papers (2023-07-10T15:00:20Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Hindsight States: Blending Sim and Real Task Elements for Efficient
Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles.
Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently.
We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Autoencoding Features for Aviation Machine Learning Problems [0.0]
This research explored an unsupervised learning method, autoencoder, to extract effective features for aviation machine learning problems.
The research results show that the autoencoder can not only automatically extract effective features for the flight track data, but also efficiently deep clean data, thereby reducing the workload of data scientists.
The developed applications and techniques are shared with the whole aviation community to improve effectiveness of ongoing and future machine learning studies.
arXiv Detail & Related papers (2020-11-03T04:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.