Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning
- URL: http://arxiv.org/abs/2403.19262v2
- Date: Tue, 01 Oct 2024 08:05:23 GMT
- Title: Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning
- Authors: Dieter Coppens, Ben Van Herbruggen, Adnan Shahid, Eli De Poorter,
- Abstract summary: Multipath effects and non-line-of-sight conditions cause ranging errors between anchors and tags.
Existing approaches for mitigating these ranging errors rely on collecting large labeled datasets.
This paper proposes a novel self-supervised deep reinforcement learning approach that does not require labeled ground truth data.
- Score: 1.4061979259370274
- License:
- Abstract: Indoor positioning using UWB technology has gained interest due to its centimeter-level accuracy potential. However, multipath effects and non-line-of-sight conditions cause ranging errors between anchors and tags. Existing approaches for mitigating these ranging errors rely on collecting large labeled datasets, making them impractical for real-world deployments. This paper proposes a novel self-supervised deep reinforcement learning approach that does not require labeled ground truth data. A reinforcement learning agent uses the channel impulse response as a state and predicts corrections to minimize the error between corrected and estimated ranges. The agent learns, self-supervised, by iteratively improving corrections that are generated by combining the predictability of trajectories with filtering and smoothening. Experiments on real-world UWB measurements demonstrate comparable performance to state-of-the-art supervised methods, overcoming data dependency and lack of generalizability limitations. This makes self-supervised deep reinforcement learning a promising solution for practical and scalable UWB-ranging error correction.
Related papers
- Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Deep GEM-Based Network for Weakly Supervised UWB Ranging Error
Mitigation [29.827191184889898]
We present a learning framework based on weak supervision for UWB ranging error mitigation.
Specifically, we propose a deep learning method based on the generalized expectation-maximization (GEM) algorithm for robust UWB ranging error mitigation.
arXiv Detail & Related papers (2023-05-23T10:26:50Z) - A Semi-Supervised Learning Approach for Ranging Error Mitigation Based
on UWB Waveform [29.827191184889898]
We propose a semi-supervised learning method based on variational Bayes for UWB ranging error mitigation.
Our method can efficiently accumulate knowledge from both labeled and unlabeled data samples.
arXiv Detail & Related papers (2023-05-23T10:08:42Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field
Disparity Estimation [22.389903710616508]
unsupervised methods can achieve comparable accuracy, but much higher generalization capacity and efficiency than supervised methods.
We present OPAL, which successfully extracts and encodes the general occlusion patterns inherent in the light field for loss calculation.
arXiv Detail & Related papers (2022-03-04T10:32:18Z) - Automating Control of Overestimation Bias for Continuous Reinforcement
Learning [65.63607016094305]
We present a data-driven approach for guiding bias correction.
We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm.
arXiv Detail & Related papers (2021-10-26T09:27:12Z) - Robust Ultra-wideband Range Error Mitigation with Deep Learning at the
Edge [0.0]
Multipath effects, reflections, refractions, and complexity of the indoor radio environment can introduce a positive bias in the ranging measurement.
This article proposes an efficient representation learning methodology that exploits the latest advancement in deep learning and graph optimization techniques.
Channel Impulse Response (CIR) signals are directly exploited to extract high semantic features to estimate corrections in either NLoS or LoS conditions.
arXiv Detail & Related papers (2020-11-30T10:52:21Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.