Related papers: LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

URL: http://arxiv.org/abs/2306.08259v2
Date: Sat, 28 Oct 2023 08:38:00 GMT
Title: LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
Authors: Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann
Abstract summary: Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning. However, the promising results achieved on current public datasets may not be applicable to practical scenarios. We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
Score: 65.71129509623587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors in California with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.

Related papers

Fine-Grained Urban Traffic Forecasting on Metropolis-Scale Road Networks [14.684896571014747]
We release datasets representing the road networks of two major cities with the largest containing almost 100,000 road segments.<n>Our datasets contain rich road features and provide fine-grained data about both traffic volume and traffic speed.
arXiv Detail & Related papers (2025-10-02T17:53:51Z)
Core-Set Selection for Data-efficient Land Cover Segmentation [16.89537279044251]
We propose six novel core-set selection methods for selecting important subsets of samples from remote sensing image segmentation datasets.<n>We benchmark these approaches against a random-selection baseline on three commonly used land cover classification datasets.<n>This result shows the importance and potential of data-centric learning for the remote sensing domain.
arXiv Detail & Related papers (2025-05-02T12:22:08Z)
Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets. High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z)
Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset [1.3757956340051605]
We present a flexible and efficient end to end pipeline for working with the Dynamic World dataset. This includes a pre-processing and representation framework which tackles noise removal, efficient extraction of large amounts of data, and re-representation of LULC data. To demonstrate the power of our pipeline, we use it to extract data for an urbanization prediction problem and build a suite of machine learning models with excellent performance.
arXiv Detail & Related papers (2024-10-11T16:13:01Z)
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning [3.623224034411137]
offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We show how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets.
arXiv Detail & Related papers (2024-09-18T14:13:24Z)
Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios [49.1574468325115]
We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability. We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides. One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
arXiv Detail & Related papers (2024-07-03T16:08:05Z)
XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges [3.7509821052818118]
XXLTraffic is the largest available public traffic dataset with the longest timespan and increasing number of sensor nodes. Our dataset supplements existing-temporal data resources and leads to new research directions in this domain.
arXiv Detail & Related papers (2024-06-18T15:06:22Z)
Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data [0.0]
This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data. We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage.
arXiv Detail & Related papers (2024-03-27T22:39:08Z)
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria. We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets. We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z)
Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data? [0.8002196839441036]
We propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. The proposed methodology proves valuable in evaluating large traffic datasets' true information content.
arXiv Detail & Related papers (2023-10-31T11:23:10Z)
Large Scale Real-World Multi-Person Tracking [68.27438015329807]
This paper presents a new large scale multi-person tracking dataset -- textttPersonPath22. It is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20.
arXiv Detail & Related papers (2022-11-03T23:03:13Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
The Stanford Drone Dataset is More Complex than We Think: An Analysis of Key Characteristics [2.064612766965483]
We discuss the characteristics of the Stanford Drone dataset (SDD) We demonstrate how this insufficiency reduces the information available to users and can impact performance. Our intention is to increase the performance and methods applied to this dataset going forward, while also clearly detailing less obvious features of the dataset for new users.
arXiv Detail & Related papers (2022-03-22T13:58:14Z)
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points. Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape. We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.