The Full-scale Assembly Simulation Testbed (FAST) Dataset
- URL: http://arxiv.org/abs/2403.08969v1
- Date: Wed, 13 Mar 2024 21:30:01 GMT
- Title: The Full-scale Assembly Simulation Testbed (FAST) Dataset
- Authors: Alec G. Moore, Tiffany D. Do, Nayan N. Chawla, Antonia Jimenez Iriarte, Ryan P. McMahan,
- Abstract summary: We present a new open dataset captured with our VR-based Full-scale Assembly Simulation Testbed (FAST)
This dataset consists of data collected from 108 participants learning how to assemble two distinct full-scale structures in VR.
- Score: 3.483595743063401
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In recent years, numerous researchers have begun investigating how virtual reality (VR) tracking and interaction data can be used for a variety of machine learning purposes, including user identification, predicting cybersickness, and estimating learning gains. One constraint for this research area is the dearth of open datasets. In this paper, we present a new open dataset captured with our VR-based Full-scale Assembly Simulation Testbed (FAST). This dataset consists of data collected from 108 participants (50 females, 56 males, 2 non-binary) learning how to assemble two distinct full-scale structures in VR. In addition to explaining how the dataset was collected and describing the data included, we discuss how the dataset may be used by future researchers.
Related papers
- SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Imitation Learning Datasets: A Toolkit For Creating Datasets, Training
Agents and Benchmarking [0.9944647907864256]
Imitation learning field requires expert data to train agents in a task.
Most often, this learning approach suffers from the absence of available data.
This work aims to address these issues by creating Imitation Learning datasets.
arXiv Detail & Related papers (2024-03-01T14:18:46Z) - LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset [75.9621305227523]
We introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art large language models (LLMs)
This dataset is collected from 210K IP addresses in the wild on our Vicuna demo and Arena website.
We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions.
arXiv Detail & Related papers (2023-09-21T12:13:55Z) - DataFinder: Scientific Dataset Recommendation from Natural Language
Descriptions [100.52917027038369]
We operationalize the task of recommending datasets given a short natural language description.
To facilitate this task, we build the DataFinder dataset which consists of a larger automatically-constructed training set and a smaller expert-annotated evaluation set.
This system, trained on the DataFinder dataset, finds more relevant search results than existing third-party dataset search engines.
arXiv Detail & Related papers (2023-05-26T05:22:36Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - Ensemble Machine Learning Model Trained on a New Synthesized Dataset
Generalizes Well for Stress Prediction Using Wearable Devices [3.006016887654771]
We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols.
We propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data.
arXiv Detail & Related papers (2022-09-30T00:20:57Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Can Population-based Engagement Improve Personalisation? A Novel Dataset
and Experiments [21.12546768556595]
VLE is a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures.
Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models.
Experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders.
arXiv Detail & Related papers (2022-06-22T15:53:24Z) - Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset
Evaluation for Text Classification [39.01740345482624]
In this paper, we ask the research question of whether all the datasets in the benchmark are necessary.
Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power.
arXiv Detail & Related papers (2022-05-04T15:33:00Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize
Daily Activities of the Elderly [6.597705088139007]
We introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view.
The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences.
We also propose a novel network called four-stream adaptive CNN (FSA-CNN)
arXiv Detail & Related papers (2020-03-04T07:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.