Statistical Collusion by Collectives on Learning Platforms
- URL: http://arxiv.org/abs/2502.04879v1
- Date: Fri, 07 Feb 2025 12:36:23 GMT
- Title: Statistical Collusion by Collectives on Learning Platforms
- Authors: Etienne Gauthier, Francis Bach, Michael I. Jordan,
- Abstract summary: Collectives may seek to influence platforms to align with their own interests.
It is essential to understand the computations that collectives must perform to impact platforms in this way.
We develop a framework that provides a theoretical and algorithmic treatment of these issues.
- Score: 49.1574468325115
- License:
- Abstract: As platforms increasingly rely on learning algorithms, collectives may form and seek ways to influence these platforms to align with their own interests. This can be achieved by coordinated submission of altered data. To evaluate the potential impact of such behavior, it is essential to understand the computations that collectives must perform to impact platforms in this way. In particular, collectives need to make a priori assessments of the effect of the collective before taking action, as they may face potential risks when modifying their data. Moreover they need to develop implementable coordination algorithms based on quantities that can be inferred from observed data. We develop a framework that provides a theoretical and algorithmic treatment of these issues and present experimental results in a product evaluation domain.
Related papers
- Data-Efficient Pretraining with Group-Level Data Influence Modeling [49.18903821780051]
Group-Level Data Influence Modeling (Group-MATES) is a novel data-efficient pretraining method.
Group-MATES collects oracle group-level influences by locally probing the pretraining model with data sets.
It then fine-tunes a relational data influence model to approximate oracles as relationship-weighted aggregations of individual influences.
arXiv Detail & Related papers (2025-02-20T16:34:46Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Collaborative Learning via Prediction Consensus [38.89001892487472]
We consider a collaborative learning setting where the goal of each agent is to improve their own model by leveraging the expertise of collaborators.
We propose a distillation-based method leveraging shared unlabeled auxiliary data, which is pseudo-labeled by the collective.
We demonstrate empirically that our collaboration scheme is able to significantly boost the performance of individual models.
arXiv Detail & Related papers (2023-05-29T14:12:03Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - Deep Goal-Oriented Clustering [25.383738675621505]
Clustering and prediction are two primary tasks in the fields of unsupervised and supervised learning.
We introduce Deep Goal-Oriented Clustering (DGC), a probabilistic framework that clusters the data by jointly using supervision via side-information.
We show the effectiveness of our model on a range of datasets by achieving prediction accuracies comparable to the state-of-the-art.
arXiv Detail & Related papers (2020-06-07T20:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.