Related papers: Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs

Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs

URL: http://arxiv.org/abs/2507.16833v1
Date: Tue, 15 Jul 2025 03:35:56 GMT
Title: Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs
Authors: Qiuyu Shi, Kangming Li, Yao Fehlis, Daniel Persaud, Robert Black, Jason Hattrick-Simpers,
Abstract summary: This study develops an automated workflow to detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values.<n>A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features.
Score: 0.49478969093606673
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise reduces detection and recovery but can be compensated for by larger clean training data sets. Detection and correction results vary between features with continuous and dispersed feature distributions showing greater recoverability compared to features with discrete or narrow distributions. This systematic study not only demonstrates a model agnostic framework for rational data recovery in the presence of noise, limited data, and differing feature distributions but also provides a tangible benchmark of kNN imputation in materials data sets. Ultimately, it aims to enhance data quality and experimental precision in automated materials discovery.

Related papers

A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Explainable fault and severity classification for rolling element bearings using Kolmogorov-Arnold networks [4.46753539114796]
Bearing faults are a leading cause of machinery failures.<n>This study utilizes Kolmogorov-Arnold Networks to address these challenges.<n>It produces lightweight models that deliver explainable results.
arXiv Detail & Related papers (2024-12-02T09:40:03Z)
COMPILED: Deep Metric Learning for Defect Classification of Threaded Pipe Connections using Multichannel Partially Observed Functional Data [6.688305507010403]
We focus on defect classification where each sample is represented as partially observed multichannel functional data.<n>The available samples for each defect type are limited and imbalanced.<n>We propose an innovative classification approach named as COMPILED based on deep metric learning.
arXiv Detail & Related papers (2024-04-04T09:55:11Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Active Foundational Models for Fault Diagnosis of Electrical Motors [0.5999777817331317]
Fault detection and diagnosis of electrical motors is of utmost importance in ensuring the safe and reliable operation of industrial systems. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts of labeled samples. We propose a foundational model-based Active Learning framework that utilizes less amount of labeled samples.
arXiv Detail & Related papers (2023-11-27T03:25:12Z)
An Improved Anomaly Detection Model for Automated Inspection of Power Line Insulators [0.0]
Inspection of insulators is important to ensure reliable operation of the power system. Deep learning is being increasingly exploited to automate the inspection process. This article proposes the use of anomaly detection along with object detection in a two-stage approach for incipient fault detection.
arXiv Detail & Related papers (2023-11-14T11:36:20Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
The role of noise in denoising models for anomaly detection in medical images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images. Unsupervised anomaly detection approaches have been proposed using only normal data for training. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z)
Improving the Robustness of Summarization Models by Detecting and Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes. We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
Learning Informative Health Indicators Through Unsupervised Contrastive Learning [5.193936395510582]
This study proposes a novel, versatile and unsupervised approach to learn health indicators. The approach is evaluated on two tasks and case studies with different characteristics. Our results show that the proposed methodology effectively learns a health indicator that follows the wear of milling machines.
arXiv Detail & Related papers (2022-08-28T21:04:42Z)
Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets [0.7388859384645262]
We study the problem of predicting the presence of transient noise artifacts in a gravitational wave detector. We introduce models that reduce the error rate by over 60% compared to the previous state of the art.
arXiv Detail & Related papers (2022-02-27T23:41:23Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.