Related papers: IITP-VDLand: A Comprehensive Dataset on Decentraland Parcels

IITP-VDLand: A Comprehensive Dataset on Decentraland Parcels

URL: http://arxiv.org/abs/2404.07533v1
Date: Thu, 11 Apr 2024 07:54:14 GMT
Title: IITP-VDLand: A Comprehensive Dataset on Decentraland Parcels
Authors: Ankit K. Bhagat, Dipika Jha, Raju Halder, Rajendra N. Paramanik, Chandra M. Kumar,
Abstract summary: IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions. We introduce a key in the dataset, namely Rarity score, which measures the uniqueness of each parcel within the virtual world.
Score: 1.83621951969607
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper presents IITP-VDLand, a comprehensive dataset of Decentraland parcels sourced from diverse platforms. Unlike existing datasets which have limited attributes and records, IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions. Alongside, we introduce a key attribute in the dataset, namely Rarity score, which measures the uniqueness of each parcel within the virtual world. Addressing the significant challenge posed by the dispersed nature of this data across various sources, we employ a systematic approach, utilizing both available APIs and custom scripts, to gather it. Subsequently, we meticulously curate and organize the information into four distinct segments: (1) Characteristics Data-Fragment, (2) OpenSea Trading History Data-Fragment, (3) Ethereum Activity Transactions Data-Fragment, and (4) Social Media Data-Fragment. We envisage that this dataset would serve as a robust resource for training machine- and deep-learning models specifically designed to address real-world challenges within the domain of Decentraland parcels. The performance benchmarking of more than 20 state-of-the-art price prediction models on our dataset yields promising results, achieving a maximum R2 score of 0.8251 and an accuracy of 74.23% in case of Extra Trees Regressor and Classifier. The key findings reveal that the ensemble models performs better than both deep learning and linear models for our dataset. We observe a significant impact of coordinates, geographical proximity, rarity score, and few other economic indicators on the prediction of parcel prices.

Related papers

Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance [37.850085364753845]
This work examines how feature selection, cross-dataset transfer, and geographic diversity influence trajectory prediction accuracy in multi-agent settings.<n>We evaluate a state-of-the-art model using our novel L4 Motion Forecasting dataset based on our own data recordings in Germany and the US.
arXiv Detail & Related papers (2025-07-07T15:18:51Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Core-Set Selection for Data-efficient Land Cover Segmentation [16.89537279044251]
We propose six novel core-set selection methods for selecting important subsets of samples from remote sensing image segmentation datasets.<n>We benchmark these approaches against a random-selection baseline on three commonly used land cover classification datasets.<n>This result shows the importance and potential of data-centric learning for the remote sensing domain.
arXiv Detail & Related papers (2025-05-02T12:22:08Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities [5.767156832161819]
We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders. To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of $5$ multimodal datasets. We then train a single powerful model on these diverse datasets simultaneously.
arXiv Detail & Related papers (2024-12-18T18:11:53Z)
Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management. Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources. We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z)
Paved2Paradise: Cost-Effective and Scalable LiDAR Simulation by Factoring the Real World [0.2630859234884723]
We describe Paved2Paradise, a simple, cost-effective approach for generating lidar datasets from scratch. We show that a model trained exclusively on Paved2Paradise synthetic data is highly effective at detecting humans in orchards.
arXiv Detail & Related papers (2023-12-02T12:23:07Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting. In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes. We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Global-Local Item Embedding for Temporal Set Prediction [5.697778174290439]
GLOIE learns to utilize the temporal properties of sets across whole users as well as within a user. GLOIE uses Variational Autoencoder (VAE) and dynamic graph-based model to capture global and local information.
arXiv Detail & Related papers (2021-09-05T13:36:57Z)
Assessing Data Efficiency in Task-Oriented Semantic Parsing [54.87705549021248]
We introduce a four-stage protocol which gives an approximate measure of how much in-domain "target" data a requires to achieve a certain quality bar. We apply our protocol in two real-world case studies illustrating its flexibility and applicability to practitioners in task-oriented semantic parsing.
arXiv Detail & Related papers (2021-07-10T02:43:16Z)
Cross-Dataset Collaborative Learning for Semantic Segmentation [17.55660581677053]
We present a simple, flexible, and general method for semantic segmentation, termed Cross-Dataset Collaborative Learning (CDCL) Given multiple labeled datasets, we aim to improve the generalization and discrimination of feature representations on each dataset. We conduct extensive evaluations on four diverse datasets, i.e., Cityscapes, BDD100K, CamVid, and COCO Stuff, with single-dataset and cross-dataset settings.
arXiv Detail & Related papers (2021-03-21T09:59:47Z)
Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation. We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z)
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instances during training. Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.