Related papers: rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition

rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition

URL: http://arxiv.org/abs/2305.10222v1
Date: Wed, 17 May 2023 13:55:50 GMT
Title: rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition
Authors: Mohammadreza Heydarian and Thomas E. Doyle
Abstract summary: Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home. This paper presents the methods by which other researchers may identify and correct similar problems in public datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home. While researchers focus on the methodology of processing data, users wonder if the Artificial Intelligence (AI) methods used for HAR can be trusted. Trust depends mainly on the reliability or robustness of the system. To investigate the robustness of HAR systems, we analyzed several suitable current public datasets and selected WISDM for our investigation of Deep Learning approaches. While the published specification of WISDM matched our fundamental requirements (e.g., large, balanced, multi-hardware), several hidden issues were found in the course of our analysis. These issues reduce the performance and the overall trust of the classifier. By identifying the problems and repairing the dataset, the performance of the classifier was increased. This paper presents the methods by which other researchers may identify and correct similar problems in public datasets. By fixing the issues dataset veracity is improved, which increases the overall trust in the trained HAR system.

Related papers

Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science [44.18533574465929]
We introduce AssistedDS, a benchmark designed to evaluate how large language models handle domain knowledge.<n>We assess state-of-the-art LLMs on their ability to discern and apply beneficial versus harmful domain knowledge.<n>Our results demonstrate a substantial gap in current models' ability to critically evaluate and leverage expert knowledge.
arXiv Detail & Related papers (2025-05-25T05:50:21Z)
DCA-Bench: A Benchmark for Dataset Curation Agents [9.60250892491588]
Data quality issues, such as incomplete documentation, inaccurate labels, ethical concerns, and outdated information, remain common in widely used datasets.<n>With the surging ability of large language models (LLM), it's promising to streamline the discovery of hidden dataset issues with LLM agents.<n>In this work, we establish a benchmark to measure LLM agent's ability to tackle this challenge.
arXiv Detail & Related papers (2024-06-11T14:02:23Z)
A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning [51.7818820745221]
Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent.
arXiv Detail & Related papers (2024-05-30T04:46:40Z)
Deep Learning-Based Out-of-distribution Source Code Data Identification: How Far Have We Gone? [23.962076093344166]
We propose an innovative deep learning-based approach addressing the OOD source code data identification problem. Our method is derived from an information-theoretic perspective with the use of innovative cluster-contrastive learning. Our method achieves a significantly higher performance from around 15.27%, 7.39%, and 4.93% on the FPR, AUROC, and AUPR measures, respectively.
arXiv Detail & Related papers (2024-04-09T02:52:55Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection [8.12993269922936]
We argue that data collection for AI should be performed in a responsible manner. We propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics.
arXiv Detail & Related papers (2023-08-22T18:01:27Z)
Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity. This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations. In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
Dataset Bias in Human Activity Recognition [57.91018542715725]
This contribution statistically curates the training data to assess to what degree the physical characteristics of humans influence HAR performance. We evaluate the performance of a state-of-the-art convolutional neural network on two HAR datasets that vary in the sensors, activities, and recording for time-series HAR.
arXiv Detail & Related papers (2023-01-19T12:33:50Z)
Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z)
Deep Learning for Person Re-identification: A Survey and Outlook [233.36948173686602]
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings.
arXiv Detail & Related papers (2020-01-13T12:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.