Related papers: From Community Network to Community Data: Towards Combining Data Pool and Data Cooperative for Data Justice in Rural Areas

From Community Network to Community Data: Towards Combining Data Pool and Data Cooperative for Data Justice in Rural Areas

URL: http://arxiv.org/abs/2503.05950v1
Date: Fri, 07 Mar 2025 21:41:01 GMT
Title: From Community Network to Community Data: Towards Combining Data Pool and Data Cooperative for Data Justice in Rural Areas
Authors: Jean Louis Fendji Kedieng Ebongue,
Abstract summary: This study explores the shift from community networks (CNs) to community data in rural areas.<n>It focuses on combining data pools and data cooperatives to achieve data justice and foster and a just AI ecosystem.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study explores the shift from community networks (CNs) to community data in rural areas, focusing on combining data pools and data cooperatives to achieve data justice and foster and a just AI ecosystem. With 2.7 billion people still offline, especially in the Global South, addressing data justice is critical. While discussions related to data justice have evolved to include economic dimensions, rural areas still struggle with the challenge of being adequately represented in the datasets. This study investigates a Community Data Model (CDM) that integrates the simplicity of data pools with the structured organization of data cooperatives to generate local data for AI for good. CDM leverages CNs, which have proven effective in promoting digital inclusion, to establish a centralized data repository, ensuring accessibility through open data principles. The model emphasizes community needs, prioritizing local knowledge, education, and traditional practices, with an iterative approach starting from pilot projects. Capacity building is a core component of digital literacy training and partnership with educational institutions and NGOs. The legal and regulatory dimension ensures compliance with data privacy laws. By empowering rural communities to control and manage their data, the CDM fosters equitable access and participation and sustains local identity and knowledge. This approach can mitigate the challenges of data creation in rural areas and enhance data justice. CDM can contribute to AI by improving data quality and relevance, enabling rural areas to benefit from AI advancements.

Related papers

The Case for Strategic Data Stewardship: Re-imagining Data Governance to Make Responsible Data Re-use Possible [0.0]
This paper proposes strategic data stewardship as a complementary institutional function designed to activate data for public value.<n>Unlike traditional stewardship, which tends to be inwardlooking, strategic data stewardship focuses on enabling cross sector reuse.<n>It outlines core principles, functions, and competencies, and introduces a practical Data Stewardship Canvas to support adoption across contexts.
arXiv Detail & Related papers (2026-01-10T21:22:50Z)
Knowledge Augmentation in Federation: Rethinking What Collaborative Learning Can Bring Back to Decentralized Data [11.521866946305986]
We present a knowledge-centric paradigm termed Knowledge Augmentation in Federation (KAF)<n>We provide the suggested system architecture, formulate the prototypical optimization objective, and review emerging studies that employ methodologies suitable for KAF.<n>We highlight several challenges and open questions that deserve more attention from the community.
arXiv Detail & Related papers (2025-03-05T03:26:54Z)
Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.<n>It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.<n>We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z)
Predicting Internet Connectivity in Schools: A Feasibility Study Leveraging Multi-modal Data and Location Encoders in Low-Resource Settings [1.7111728540102638]
We present our work on school internet connectivity prediction using EO and ML. We find that ML with EO and ground-based auxiliary data yields the best performance in both countries, for accuracy, F1 score, and False Positive rates. Our work showcases a practical approach to support data-driven digital infrastructure development in low-resource settings.
arXiv Detail & Related papers (2024-12-13T20:20:29Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders. We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP) We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z)
Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets. We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers. Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability. We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective [10.901568085406753]
Article is dedicated to surveying on the state-of-the-art privacy techniques, which can be employed in Federated learning. Recent research has demonstrated that retaining data and on computation in FL is not enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system, which can be exploited in some privacy attacks.
arXiv Detail & Related papers (2020-11-10T21:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.