Brazil Data Commons: A Platform for Unifying and Integrating Brazil's Public Data
- URL: http://arxiv.org/abs/2511.11755v1
- Date: Thu, 13 Nov 2025 20:18:20 GMT
- Title: Brazil Data Commons: A Platform for Unifying and Integrating Brazil's Public Data
- Authors: Isadora Cristina, Ramon Gonze, Jônatas Santos, Julio Reis, Mário Alvim, Bernardo Queiroz, Fabrício Benevenuto,
- Abstract summary: Brazil Data Commons is a platform that unifies various Brazilian datasets under a common semantic framework.<n>By adopting globally recognized and interoperable data standards, Brazil Data Commons aligns with the principles of the broader Data Commons ecosystem.
- Score: 0.3322570886790747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fragmentation of public data in Brazil, coupled with inconsistent standards and limited interoperability, hinders effective research, evidence-based policymaking and access to data-driven insights. To address these issues, we introduce Brazil Data Commons, a platform that unifies various Brazilian datasets under a common semantic framework, enabling the seamless discovery, integration and visualization of information from different domains. By adopting globally recognized ontologies and interoperable data standards, Brazil Data Commons aligns with the principles of the broader Data Commons ecosystem and places Brazilian data in a global context. Through user-friendly interfaces, straightforward query mechanisms and flexible data access options, the platform democratizes data use and enables researchers, policy makers, and the public to gain meaningful insights and make informed decisions. This paper illustrates how Brazil Data Commons transforms scattered datasets into an integrated and easily navigable resource that allows a deeper understanding of Brazil's complex social, economic and environmental landscape.
Related papers
- Harnessing Rich Multi-Modal Data for Spatial-Temporal Homophily-Embedded Graph Learning Across Domains and Localities [2.5065738436850835]
This research proposes a heterogeneous data pipeline that performs cross-domain data fusion.<n>We aim to address complex urban problems across multiple domains and localities by harnessing the rich information over 50 data sources.
arXiv Detail & Related papers (2025-12-11T23:51:54Z) - How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - Amplify Initiative: Building A Localized Data Platform for Globalized AI [3.045104054104307]
Current AI models often fail to account for local context and language, given the predominance of English and Western internet content in their training data.<n>Amplify Initiative, a data platform and methodology, leverages expert communities to collect diverse, high-quality data to address the limitations of these models.<n>The platform is designed to enable co-creation of datasets, provide access to high-quality multilingual datasets, and offer recognition to data authors.
arXiv Detail & Related papers (2025-04-18T23:20:52Z) - From Community Network to Community Data: Towards Combining Data Pool and Data Cooperative for Data Justice in Rural Areas [0.0]
This study explores the shift from community networks (CNs) to community data in rural areas.<n>It focuses on combining data pools and data cooperatives to achieve data justice and foster and a just AI ecosystem.
arXiv Detail & Related papers (2025-03-07T21:41:01Z) - Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users.<n>The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z) - KamerRaad: Enhancing Information Retrieval in Belgian National Politics through Hierarchical Summarization and Conversational Interfaces [55.00702535694059]
KamerRaad is an AI tool that leverages large language models to help citizens interactively engage with Belgian political information.
The tool extracts and concisely summarizes key excerpts from parliamentary proceedings, followed by the potential for interaction based on generative AI.
arXiv Detail & Related papers (2024-04-22T15:01:39Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Data Commons [4.568270630281101]
Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs.
This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.
arXiv Detail & Related papers (2023-09-08T00:14:09Z) - Diverse Community Data for Benchmarking Data Privacy Algorithms [0.2999888908665658]
The Collaborative Research Cycle (CRC) is a National Institute of Standards and Technology (NIST) benchmarking program.
Deidentification algorithms are vulnerable to the same bias and privacy issues that impact other data analytics and machine learning applications.
This paper summarizes four CRC contributions on the relationship between diverse populations and challenges for equitable deidentification.
arXiv Detail & Related papers (2023-06-20T17:18:51Z) - Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings [44.4879068879732]
This paper presents a complete pipeline for resolving ambiguities during the data association.
Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements.
We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany.
arXiv Detail & Related papers (2022-07-28T12:29:39Z) - RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
Retrieval [66.2075707179047]
We propose a novel mixture-of-expert transformer RoME that disentangles the text and the video into three levels.
We utilize a transformer-based attention mechanism to fully exploit visual and text embeddings at both global and local levels.
Our method outperforms the state-of-the-art methods on the YouCook2 and MSR-VTT datasets.
arXiv Detail & Related papers (2022-06-26T11:12:49Z) - INODE: Building an End-to-End Data Exploration System in Practice
[Extended Vision] [30.411996388471817]
INODE is an end-to-end data exploration system.
We demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics.
arXiv Detail & Related papers (2021-04-09T05:04:04Z) - Explainable Patterns: Going from Findings to Insights to Support Data
Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings.
ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.