Data integrity vs. inference accuracy in large AIS datasets
- URL: http://arxiv.org/abs/2501.03358v1
- Date: Mon, 06 Jan 2025 19:58:00 GMT
- Title: Data integrity vs. inference accuracy in large AIS datasets
- Authors: Adam Kiersztyn, Dariusz Czerwiński, Aneta Oniszczuk-Jastrzabek, Ernest Czermański, Agnieszka Rzepka,
- Abstract summary: This paper analyzes the impact of data integrity in large AIS datasets, on classification accuracy.<n>It also presents er-ror detection and correction methods and data verification techniques that can improve the reliability of AIS systems.
- Score: 1.019446914776079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Ship Identification Systems (AIS) play a key role in monitoring maritime traffic, providing the data necessary for analysis and decision-making. The integrity of this data is fundamental to the correctness of infer-ence and decision-making in the context of maritime safety, traffic manage-ment and environmental protection. This paper analyzes the impact of data integrity in large AIS datasets, on classification accuracy. It also presents er-ror detection and correction methods and data verification techniques that can improve the reliability of AIS systems. The results show that improving the integrity of AIS data significantly improves the quality of inference, which has a direct impact on operational efficiency and safety at sea.
Related papers
- Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z) - Context-Aware Online Conformal Anomaly Detection with Prediction-Powered Data Acquisition [35.59201763567714]
We introduce context-aware prediction-powered conformal online anomaly detection (C-PP-COAD)<n>Our framework strategically leverages synthetic calibration data to mitigate data scarcity, while adaptively integrating real data based on contextual cues.<n>Experiments conducted on both synthetic and real-world datasets demonstrate that C-PP-COAD significantly reduces dependency on real calibration data without compromising guaranteed false discovery rate (FDR)
arXiv Detail & Related papers (2025-05-03T10:58:05Z) - Noise-Adaptive Conformal Classification with Marginal Coverage [53.74125453366155]
We introduce an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise.
We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets.
arXiv Detail & Related papers (2025-01-29T23:55:23Z) - Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions [8.666879925570331]
Real-world offline datasets are often subject to data corruptions due to sensor failures or malicious attacks.
Existing methods struggle to learn robust agents under high uncertainty caused by corrupted data.
We propose a novel robust variational Bayesian inference for offline RL (TRACER)
arXiv Detail & Related papers (2024-11-01T09:28:24Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - ECS -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
We present a novel approach for the assurance of data quality.
For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples.
This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
arXiv Detail & Related papers (2023-07-10T06:49:18Z) - Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point [5.825190876052149]
We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA.
A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
arXiv Detail & Related papers (2023-06-07T13:31:57Z) - Generalized but not Robust? Comparing the Effects of Data Modification
Methods on Out-of-Domain Generalization and Adversarial Robustness [27.868217989276797]
We study common data modification strategies and evaluate their in-domain and adversarial robustness.
Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR.
However, data filtering hurts OOD accuracy on other tasks such as question answering and image classification.
arXiv Detail & Related papers (2022-03-15T05:32:44Z) - Vessel and Port Efficiency Metrics through Validated AIS data [0.0]
We propose a machine learning-based data-driven methodology to detect and correct erroneous AIS data.
We also propose a metric that can be used by vessel operators and ports to express numerically their business and environmental efficiency.
arXiv Detail & Related papers (2021-04-30T19:51:51Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - The Impact of Data on the Stability of Learning-Based Control- Extended
Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance.
By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities.
Uncalibrated deep networks are known to make over-confident predictions.
We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.