Variable-Based Network Analysis of Datasets on Data Exchange Platforms
- URL: http://arxiv.org/abs/2003.05109v1
- Date: Wed, 11 Mar 2020 04:42:30 GMT
- Title: Variable-Based Network Analysis of Datasets on Data Exchange Platforms
- Authors: Teruaki Hayashi, Yukio Ohsawa
- Abstract summary: We apply a network approach with a novel variable-based structural analysis to the metadata of datasets on two data platform services.
It was noted that the structures of the data networks are locally dense and highly assortative, similar to human-related net-works.
- Score: 0.15229257192293197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, data exchange platforms have emerged in the digital economy to
enable better resource allocation in a data-driven society, which requires
cross-organizational data collaborations. Understanding the characteristics of
the data on these platforms is important for their application; however, the
structures of such platforms have not been extensively investigated. In this
study, we apply a network approach with a novel variable-based structural
analysis to the metadata of datasets on two data platform services. It was
noted that the structures of the data networks are locally dense and highly
assortative, similar to human-related net-works. Even though the data on these
platforms are designed and collected differently, depending on the use
objectives, the variables of heterogeneous data exhibit a power distribution,
and the data networks exhibit multi-scaling behavior. Furthermore, we found
that the data collection strategies of the platforms are related to the variety
of variables, density of the networks, and their robustness from the viewpoint
of sustainability and social acceptability of the data platforms.
Related papers
- Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research [0.0]
Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyberbullying, and other significant topics.
Access to these datasets is often restricted due to costs and platform regulations.
This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms.
arXiv Detail & Related papers (2024-07-11T09:12:39Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - Decentralized Data Governance as Part of a Data Mesh Platform: Concepts
and Approaches [0.0]
Data mesh is a socio-technical approach to decentralized analytics data management.
This paper presents a conceptual model of key data mesh concepts and discusses different approaches to drive governance through platform means.
arXiv Detail & Related papers (2023-07-05T15:18:15Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Federated Learning over Harmonized Data Silos [0.7106986689736825]
Federated Learning is a distributed machine learning approach that enables geographically distributed data silos to collaboratively learn a joint machine learning model without sharing data.
We propose an architectural vision for an end-to-end Federated Learning and Integration system, incorporating the critical steps of data harmonization and data imputation.
arXiv Detail & Related papers (2023-05-15T19:55:51Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Towards Federated Bayesian Network Structure Learning with Continuous
Optimization [14.779035801521717]
We present a cross-silo federated learning approach to estimate the structure of Bayesian network.
We develop a distributed structure learning method based on continuous optimization.
arXiv Detail & Related papers (2021-10-18T14:36:05Z) - Collaborative Problem Solving on a Data Platform Kaggle [0.4511923587827301]
Data exchange ecosystem is developed by platform services that facilitate data and knowledge exchange.
In this study, we investigate Kaggle, a data analysis competition platform.
arXiv Detail & Related papers (2021-07-26T02:28:01Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - Multi-level Graph Convolutional Networks for Cross-platform Anchor Link
Prediction [47.047999403900775]
Cross-platform account matching plays a significant role in social network analytics.
We propose a novel framework that considers multi-level graph convolutions on both local network structure and hypergraph structure.
The proposed method overcomes data insufficiency problem of existing work and does not necessarily rely on user demographic information.
arXiv Detail & Related papers (2020-06-02T22:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.