Related papers: Sound Event Classification in an Industrial Environment: Pipe Leakage Detection Use Case

Sound Event Classification in an Industrial Environment: Pipe Leakage Detection Use Case

URL: http://arxiv.org/abs/2205.02706v1
Date: Thu, 5 May 2022 15:26:22 GMT
Title: Sound Event Classification in an Industrial Environment: Pipe Leakage Detection Use Case
Authors: Ibrahim Shaer and Abdallah Shami
Abstract summary: A multi-stage Machine Learning pipeline is proposed for pipe leakage detection in an industrial environment. The proposed pipeline applies multiple steps, each addressing the environment's challenges. The results show that the model produces excellent results with 99% accuracy and an F1-score of 0.93 and 0.9 for the respective datasets.
Score: 3.9414768019101682
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, a multi-stage Machine Learning (ML) pipeline is proposed for pipe leakage detection in an industrial environment. As opposed to other industrial and urban environments, the environment under study includes many interfering background noises, complicating the identification of leaks. Furthermore, the harsh environmental conditions limit the amount of data collected and impose the use of low-complexity algorithms. To address the environment's constraints, the developed ML pipeline applies multiple steps, each addressing the environment's challenges. The proposed ML pipeline first reduces the data dimensionality by feature selection techniques and then incorporates time correlations by extracting time-based features. The resultant features are fed to a Support Vector Machine (SVM) of low-complexity that generalizes well to a small amount of data. An extensive experimental procedure was carried out on two datasets, one with background industrial noise and one without, to evaluate the validity of the proposed pipeline. The SVM hyper-parameters and parameters specific to the pipeline steps were tuned as part of the experimental procedure. The best models obtained from the dataset with industrial noise and leaks were applied to datasets without noise and with and without leaks to test their generalizability. The results show that the model produces excellent results with 99\% accuracy and an F1-score of 0.93 and 0.9 for the respective datasets.

Related papers

Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework. It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z)
Sampling Binary Data by Denoising through Score Functions [2.9465623430708905]
Tweedie-Miyasawa formula (TMF) is key ingredient in score-based generative models. TMF ties these together via the score function of noisy data. We adopt Bernoulli noise, instead of Gaussian noise, as a smoothing device.
arXiv Detail & Related papers (2025-02-01T20:59:02Z)
Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection [84.78475642696137]
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. We propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS) SGPS constructs reliable positive pairs for noisy samples to enhance the sample utilization.
arXiv Detail & Related papers (2025-01-19T14:41:55Z)
Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels [38.84835238599221]
Flow matching posterior estimation (FMPE) is a new machine learning approach to atmospheric retrieval. FMPE trains about 3 times faster than neural posterior estimation (NPE) and yields higher IS efficiencies. IS successfully corrects inaccurate ML results, identifies model failures via low efficiencies, and provides accurate estimates of the Bayesian evidence.
arXiv Detail & Related papers (2024-10-28T19:28:07Z)
RecFlow: An Industrial Full Flow Recommendation Dataset [66.06445386541122]
Industrial recommendation systems rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items to users. We introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages.
arXiv Detail & Related papers (2024-10-28T09:36:03Z)
SYNOSIS: Image synthesis pipeline for machine vision in metal surface inspection [1.1802456989915404]
We introduce a complete pipeline which describes in detail how to approach image synthesis for surface inspection. The pipeline is in detail evaluated for milled and sandblasted aluminum surfaces.
arXiv Detail & Related papers (2024-10-18T19:46:12Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Let's Roll: Synthetic Dataset Analysis for Pedestrian Detection Across Different Shutter Types [7.0441427250832644]
This paper studies the impact of different shutter mechanisms on machine learning (ML) object detection models on a synthetic dataset. In particular, we train and evaluate mainstream detection models with our synthetically-generated paired GS and RS datasets.
arXiv Detail & Related papers (2023-09-15T04:07:42Z)
Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases [10.876490928902838]
spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. We propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns. We generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning.
arXiv Detail & Related papers (2023-08-02T13:29:31Z)
Improving the Robustness of Summarization Models by Detecting and Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes. We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples. When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise. We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.