Graph integration of structured, semistructured and unstructured data
for data journalism
- URL: http://arxiv.org/abs/2012.08830v1
- Date: Wed, 16 Dec 2020 09:59:27 GMT
- Title: Graph integration of structured, semistructured and unstructured data
for data journalism
- Authors: Angelos-Christos Anadiotis, Oana Balalau, Catarina Conceicao, Helena
Galhardas, Mhd Yamen Haddad, Ioana Manolescu, Tayeb Merabti, Jingmao You
- Abstract summary: We describe a complete approach for integrating dynamic sets of heterogeneous datasets.
Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.
- Score: 4.508924138721326
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Digital data is a gold mine for modern journalism. However, datasets which
interest journalists are extremely heterogeneous, ranging from highly
structured (relational databases), semi-structured (JSON, XML, HTML), graphs
(e.g., RDF), and text. Journalists (and other classes of users lacking advanced
IT expertise, such as most non-governmental-organizations, or small public
administrations) need to be able to make sense of such heterogeneous corpora,
even if they lack the ability to define and deploy custom
extract-transform-load workflows, especially for dynamically varying sets of
data sources.
We describe a complete approach for integrating dynamic sets of heterogeneous
datasets along the lines described above: the challenges we faced to make such
graphs useful, allow their integration to scale, and the solutions we proposed
for these problems. Our approach is implemented within the ConnectionLens
system; we validate it through a set of experiments.
Related papers
- DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts [70.21017141742763]
Graph neural networks (GNNs) are gaining popularity for processing graph-structured data.
Existing methods generally use a fixed number of GNN layers to generate representations for all graphs.
We propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN.
arXiv Detail & Related papers (2024-11-05T11:46:27Z) - Capturing and Anticipating User Intents in Data Analytics via Knowledge Graphs [0.061446808540639365]
This work explores the usage of Knowledge Graphs (KG) as a basic framework for capturing a human-centered manner complex analytics.
The data stored in the generated KG can then be exploited to provide assistance (e.g., recommendations) to the users interacting with these systems.
arXiv Detail & Related papers (2024-11-01T20:45:23Z) - Multi-Modal Dataset Creation for Federated Learning with DICOM Structured Reports [26.2463670182172]
Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality.
This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance.
We developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets.
arXiv Detail & Related papers (2024-07-12T07:34:10Z) - Federated Neural Graph Databases [53.03085605769093]
We propose Federated Neural Graph Database (FedNGDB), a novel framework that enables reasoning over multi-source graph-based data while preserving privacy.
Unlike existing methods, FedNGDB can handle complex graph structures and relationships, making it suitable for various downstream tasks.
arXiv Detail & Related papers (2024-02-22T14:57:44Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Graph integration of structured, semistructured and unstructured data
for data journalism [0.0]
We describe a complete approach for integrating dynamic sets of heterogeneous data sources.
Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.
arXiv Detail & Related papers (2020-07-23T08:55:09Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z) - Siamese Graph Neural Networks for Data Integration [11.41207739004894]
We propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles.
Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible.
We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.
arXiv Detail & Related papers (2020-01-17T21:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.