Related papers: Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

URL: http://arxiv.org/abs/2103.10895v1
Date: Fri, 19 Mar 2021 16:32:37 GMT
Title: Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Authors: Joakim Bruslund Haurum and Thomas B. Moeslund
Abstract summary: We present a novel and publicly available multi-label classification dataset for image-based sewer defect classification called Sewer-ML. The dataset consists of 1.3 million images annotated by professional sewer inspectors from three different utility companies over nine years. We also present a benchmark algorithm and a novel metric for assessing performance.
Score: 29.728476976320913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Perhaps surprisingly sewerage infrastructure is one of the most costly infrastructures in modern society. Sewer pipes are manually inspected to determine whether the pipes are defective. However, this process is limited by the number of qualified inspectors and the time it takes to inspect a pipe. Automatization of this process is therefore of high interest. So far, the success of computer vision approaches for sewer defect classification has been limited when compared to the success in other fields mainly due to the lack of public datasets. To this end, in this work we present a large novel and publicly available multi-label classification dataset for image-based sewer defect classification called Sewer-ML. The Sewer-ML dataset consists of 1.3 million images annotated by professional sewer inspectors from three different utility companies across nine years. Together with the dataset, we also present a benchmark algorithm and a novel metric for assessing performance. The benchmark algorithm is a result of evaluating 12 state-of-the-art algorithms, six from the sewer defect classification domain and six from the multi-label classification domain, and combining the best performing algorithms. The novel metric is a class-importance weighted F2 score, $\text{F}2_{\text{CIW}}$, reflecting the economic impact of each class, used together with the normal pipe F1 score, $\text{F}1_{\text{Normal}}$. The benchmark algorithm achieves an $\text{F}2_{\text{CIW}}$ score of 55.11% and $\text{F}1_{\text{Normal}}$ score of 90.94%, leaving ample room for improvement on the Sewer-ML dataset. The code, models, and dataset are available at the project page https://vap.aau.dk/sewer-ml/

Related papers

A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic [0.0]
We propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique. We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters.
arXiv Detail & Related papers (2025-02-26T07:55:24Z)
RecFlow: An Industrial Full Flow Recommendation Dataset [66.06445386541122]
Industrial recommendation systems rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items to users. We introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages.
arXiv Detail & Related papers (2024-10-28T09:36:03Z)
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning [5.9184143707401775]
Multi-label pipe defect recognition is proposed based on mask attention guided feature enhancement and label correlation learning. The proposed method can achieve current approximate state-of-the-art classification performance using just 1/16 of the Sewer-ML training dataset.
arXiv Detail & Related papers (2024-08-01T11:51:50Z)
Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic [99.3682210827572]
Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. Data curation strategies are typically developed agnostic of the available compute for training. We introduce neural scaling laws that account for the non-homogeneous nature of web data.
arXiv Detail & Related papers (2024-04-10T17:27:54Z)
Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector [0.0]
A dataset with 14.7 km of various sewer pipes was annotated. An object detector (EfficientDet-D0) was trained for automatic defect detection. It was able to detect 83% of defects in the test set; out of the missing 17%, only 0.77% are very severe defects.
arXiv Detail & Related papers (2024-04-09T11:13:36Z)
Extending One-Stage Detection with Open-World Proposals [8.492340530784697]
We show that fully convolutional one-stage detection network FCOS can increase OWP performance by as much as 6% in recall on novel classes. While two-stage methods worsen by 6% in recall on novel classes, we show that FCOS only drops 2% when jointly optimizing for OWP and classification.
arXiv Detail & Related papers (2022-01-07T02:29:09Z)
Multi-Task Classification of Sewer Pipe Defects and Properties using a Cross-Task Graph Neural Network Decoder [56.673599764041384]
We present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN) CT-GNN refines the disjointed per-task predictions using cross-task information. We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset.
arXiv Detail & Related papers (2021-11-15T15:36:50Z)
Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks. We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML. PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z)
Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies [65.92826041406802]
We propose a Proxy-based deep Graph Metric Learning approach from the perspective of graph classification. Multiple global proxies are leveraged to collectively approximate the original data points for each class. We design a novel reverse label propagation algorithm, by which the neighbor relationships are adjusted according to ground-truth labels.
arXiv Detail & Related papers (2020-10-26T14:52:42Z)
Online Metric Learning for Multi-Label Classification [22.484707213499714]
We propose a novel online metric learning paradigm for multi-label classification. We first propose a new metric for multi-label classification based on $k$-Nearest Neighbour ($k$NN)
arXiv Detail & Related papers (2020-06-12T11:33:04Z)
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization. Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.