LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
- URL: http://arxiv.org/abs/2306.08259v2
- Date: Sat, 28 Oct 2023 08:38:00 GMT
- Title: LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
- Authors: Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai,
Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann
- Abstract summary: Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
- Score: 65.71129509623587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Road traffic forecasting plays a critical role in smart city initiatives and
has experienced significant advancements thanks to the power of deep learning
in capturing non-linear patterns of traffic data. However, the promising
results achieved on current public datasets may not be applicable to practical
scenarios due to limitations within these datasets. First, the limited sizes of
them may not reflect the real-world scale of traffic networks. Second, the
temporal coverage of these datasets is typically short, posing hurdles in
studying long-term patterns and acquiring sufficient samples for training deep
models. Third, these datasets often lack adequate metadata for sensors, which
compromises the reliability and interpretability of the data. To mitigate these
limitations, we introduce the LargeST benchmark dataset. It encompasses a total
number of 8,600 sensors in California with a 5-year time coverage and includes
comprehensive metadata. Using LargeST, we perform in-depth data analysis to
extract data insights, benchmark well-known baselines in terms of their
performance and efficiency, and identify challenges as well as opportunities
for future research. We release the datasets and baseline implementations at:
https://github.com/liuxu77/LargeST.
Related papers
- Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset [1.3757956340051605]
We present a flexible and efficient end to end pipeline for working with the Dynamic World dataset.
This includes a pre-processing and representation framework which tackles noise removal, efficient extraction of large amounts of data, and re-representation of LULC data.
To demonstrate the power of our pipeline, we use it to extract data for an urbanization prediction problem and build a suite of machine learning models with excellent performance.
arXiv Detail & Related papers (2024-10-11T16:13:01Z) - Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning [3.623224034411137]
offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems.
Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results.
We show how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets.
arXiv Detail & Related papers (2024-09-18T14:13:24Z) - Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios [49.1574468325115]
We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability.
We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides.
One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
arXiv Detail & Related papers (2024-07-03T16:08:05Z) - XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges [3.7509821052818118]
XXLTraffic is the largest available public traffic dataset with the longest timespan and increasing number of sensor nodes.
Our dataset supplements existing-temporal data resources and leads to new research directions in this domain.
arXiv Detail & Related papers (2024-06-18T15:06:22Z) - Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data [0.0]
This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data.
We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage.
arXiv Detail & Related papers (2024-03-27T22:39:08Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - Distil the informative essence of loop detector data set: Is
network-level traffic forecasting hungry for more data? [0.8002196839441036]
We propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models.
The proposed methodology proves valuable in evaluating large traffic datasets' true information content.
arXiv Detail & Related papers (2023-10-31T11:23:10Z) - Large Scale Real-World Multi-Person Tracking [68.27438015329807]
This paper presents a new large scale multi-person tracking dataset -- textttPersonPath22.
It is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20.
arXiv Detail & Related papers (2022-11-03T23:03:13Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.