Towards Ubiquitous Indoor Positioning: Comparing Systems across
Heterogeneous Datasets
- URL: http://arxiv.org/abs/2109.09436v1
- Date: Mon, 20 Sep 2021 11:37:36 GMT
- Title: Towards Ubiquitous Indoor Positioning: Comparing Systems across
Heterogeneous Datasets
- Authors: Joaqu\'in Torres-Sospedra, Ivo Silva, Lucie Klus, Darwin
Quezada-Gaibor, Antonino Crivello, Paolo Barsocchi, Cristiano Pend\~ao, Elena
Simona Lohan, Jari Nurmi and Adriano Moreira
- Abstract summary: The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities.
The dawn of datasets is pushing IPS evaluation to a similar level as machine-learning models.
This paper proposes a way to evaluate IPSs in multiple scenarios, that is validated with three use cases.
- Score: 1.3814679165245243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The evaluation of Indoor Positioning Systems (IPS) mostly relies on local
deployments in the researchers' or partners' facilities. The complexity of
preparing comprehensive experiments, collecting data, and considering multiple
scenarios usually limits the evaluation area and, therefore, the assessment of
the proposed systems. The requirements and features of controlled experiments
cannot be generalized since the use of the same sensors or anchors density
cannot be guaranteed. The dawn of datasets is pushing IPS evaluation to a
similar level as machine-learning models, where new proposals are evaluated
over many heterogeneous datasets. This paper proposes a way to evaluate IPSs in
multiple scenarios, that is validated with three use cases. The results prove
that the proposed aggregation of the evaluation metric values is a useful tool
for high-level comparison of IPSs.
Related papers
- RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines [86.36060279469304]
We introduce PredBench, a benchmark tailored for the holistic evaluation of prediction-temporal networks.
This benchmark integrates 12 widely adopted methods with diverse datasets across multiple application domains.
Its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics.
arXiv Detail & Related papers (2024-07-11T11:51:36Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Epistemic Parity: Reproducibility as an Evaluation Metric for
Differential Privacy [9.755020926517291]
We propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks.
We measure the likelihood that published conclusions would change had the authors used synthetic data.
We advocate for a new class of mechanisms that favor stronger utility guarantees and offer privacy protection.
arXiv Detail & Related papers (2022-08-26T14:57:21Z) - What are the best systems? New perspectives on NLP Benchmarking [10.27421161397197]
We propose a new procedure to rank systems based on their performance across different tasks.
Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task.
We show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure.
arXiv Detail & Related papers (2022-02-08T11:44:20Z) - Better than Average: Paired Evaluation of NLP Systems [31.311553903738798]
We show the importance of taking the instance-level pairing of evaluation scores into account.
We release a practical tool for performing the full analysis of evaluation scores with the mean, median, BT, and two variants of BT (Elo and TrueSkill)
arXiv Detail & Related papers (2021-10-20T19:40:31Z) - Fairness and underspecification in acoustic scene classification: The
case for disaggregated evaluations [6.186191586944725]
Underspecification and fairness in machine learning (ML) applications have recently become two prominent issues in the ML community.
We argue for the need of a more holistic evaluation process for Acoustic scene classification (ASC) models through disaggregated evaluations.
We demonstrate the effectiveness of the proposed evaluation process in uncovering underspecification and fairness problems when trained on two widely-used ASC datasets.
arXiv Detail & Related papers (2021-10-04T15:23:01Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.