LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry
- URL: http://arxiv.org/abs/2306.10675v2
- Date: Wed, 3 Jan 2024 02:16:30 GMT
- Title: LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry
- Authors: Lixia Wu, Haomin Wen, Haoyuan Hu, Xiaowei Mao, Yutong Xia, Ergang
Shan, Jianbin Zhen, Junhong Lou, Yuxuan Liang, Liuqing Yang, Roger
Zimmermann, Youfang Lin, Huaiyu Wan
- Abstract summary: LaDe is the first publicly available last-mile delivery dataset with millions of packages from the industry.
It involves 10k packages of 21k couriers over 6 months of real-world operation.
LaDe has three unique characteristics: Large-scale, comprehensive, diverse.
- Score: 44.573471568516915
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Real-world last-mile delivery datasets are crucial for research in logistics,
supply chain management, and spatio-temporal data mining. Despite a plethora of
algorithms developed to date, no widely accepted, publicly available last-mile
delivery dataset exists to support research in this field. In this paper, we
introduce \texttt{LaDe}, the first publicly available last-mile delivery
dataset with millions of packages from the industry. LaDe has three unique
characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers
over 6 months of real-world operation. (2) Comprehensive information. It offers
original package information, such as its location and time requirements, as
well as task-event information, which records when and where the courier is
while events such as task-accept and task-finish events happen. (3) Diversity.
The dataset includes data from various scenarios, including package pick-up and
delivery, and from multiple cities, each with its unique spatio-temporal
patterns due to their distinct characteristics such as populations. We verify
LaDe on three tasks by running several classical baseline models per task. We
believe that the large-scale, comprehensive, diverse feature of LaDe can offer
unparalleled opportunities to researchers in the supply chain community, data
mining community, and beyond. The dataset homepage is publicly available at
https://huggingface.co/datasets/Cainiao-AI/LaDe.
Related papers
- Capturing research literature attitude towards Sustainable Development Goals: an LLM-based topic modeling approach [0.7806050661713976]
The Sustainable Development Goals were formulated by the United Nations in 2015 to address these global challenges by 2030.
Natural language processing techniques can help uncover discussions on SDGs within research literature.
We propose a completely automated pipeline to fetch content from the Scopus database and prepare datasets dedicated to five groups of SDGs.
arXiv Detail & Related papers (2024-11-05T09:37:23Z) - Multimodal Banking Dataset: Understanding Client Needs through Event
Sequences [41.470088044942756]
We present the industrial-scale publicly available multimodal banking dataset, MBD, that contains more than 1.5M corporate clients.
All entries are properly anonymized from real proprietary bank data.
We provide numerical results that demonstrate the superiority of our multi-modal baselines over single-modal techniques for each task.
arXiv Detail & Related papers (2024-09-26T07:07:08Z) - MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens [113.9621845919304]
We release MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date.
MINT-1T comprises one trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets.
Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS.
arXiv Detail & Related papers (2024-06-17T07:21:36Z) - IITP-VDLand: A Comprehensive Dataset on Decentraland Parcels [1.83621951969607]
IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions.
We introduce a key in the dataset, namely Rarity score, which measures the uniqueness of each parcel within the virtual world.
arXiv Detail & Related papers (2024-04-11T07:54:14Z) - LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset [75.9621305227523]
We introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art large language models (LLMs)
This dataset is collected from 210K IP addresses in the wild on our Vicuna demo and Arena website.
We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions.
arXiv Detail & Related papers (2023-09-21T12:13:55Z) - Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for
Recommendation and Text Generation [127.35910314813854]
We present the Amazon Multi-locale Shopping Session dataset, namely Amazon-M2.
It is the first multilingual dataset consisting of millions of user sessions from six different locales.
Remarkably, the dataset can help us enhance personalization and understanding of user preferences.
arXiv Detail & Related papers (2023-07-19T00:08:49Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - Towards Rich, Portable, and Large-Scale Pedestrian Data Collection [6.250018240133604]
We propose a data collection system that is portable, which facilitates accessible large-scale data collection in diverse environments.
We introduce the first batch of dataset from the ongoing data collection effort -- the TBD pedestrian dataset.
Compared with existing pedestrian datasets, our dataset contains three components: human verified labels grounded in the metric space, a combination of top-down and perspective views, and naturalistic human behavior in the presence of a socially appropriate "robot"
arXiv Detail & Related papers (2022-03-03T19:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.