Comparing Shape-Constrained Regression Algorithms for Data Validation
- URL: http://arxiv.org/abs/2209.09602v1
- Date: Tue, 20 Sep 2022 10:31:20 GMT
- Title: Comparing Shape-Constrained Regression Algorithms for Data Validation
- Authors: Florian Bachinger, Gabriel Kronberger
- Abstract summary: Industrial and scientific applications handle large volumes of data that render manual validation by humans infeasible.
In this work, we compare different shape-constrained regression algorithms for the purpose of data validation based on their classification accuracy and runtime performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial and scientific applications handle large volumes of data that
render manual validation by humans infeasible. Therefore, we require automated
data validation approaches that are able to consider the prior knowledge of
domain experts to produce dependable, trustworthy assessments of data quality.
Prior knowledge is often available as rules that describe interactions of
inputs with regard to the target e.g. the target must be monotonically
decreasing and convex over increasing input values. Domain experts are able to
validate multiple such interactions at a glance. However, existing rule-based
data validation approaches are unable to consider these constraints. In this
work, we compare different shape-constrained regression algorithms for the
purpose of data validation based on their classification accuracy and runtime
performance.
Related papers
- Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset.
We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method.
We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Managing dataset shift by adversarial validation for credit scoring [5.560471251954645]
The inconsistency between the distribution of training data and the data that actually needs to be predicted is likely to cause poor model performance.
We propose a method based on adversarial validation to alleviate the dataset shift problem in credit scoring scenarios.
arXiv Detail & Related papers (2021-12-19T07:07:15Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Weak Adaptation Learning -- Addressing Cross-domain Data Insufficiency
with Weak Annotator [2.8672054847109134]
In some target problem domains, there are not many data samples available, which could hinder the learning process.
We propose a weak adaptation learning (WAL) approach that leverages unlabeled data from a similar source domain.
Our experiments demonstrate the effectiveness of our approach in learning an accurate classifier with limited labeled data in the target domain.
arXiv Detail & Related papers (2021-02-15T06:19:25Z) - Open-Set Hypothesis Transfer with Semantic Consistency [99.83813484934177]
We introduce a method that focuses on the semantic consistency under transformation of target data.
Our model first discovers confident predictions and performs classification with pseudo-labels.
As a result, unlabeled data can be classified into discriminative classes coincided with either source classes or unknown classes.
arXiv Detail & Related papers (2020-10-01T10:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.