Related papers: Worker Discretion Advised: Co-designing Risk Disclosure in Crowdsourced Responsible AI (RAI) Content Work

Worker Discretion Advised: Co-designing Risk Disclosure in Crowdsourced Responsible AI (RAI) Content Work

URL: http://arxiv.org/abs/2509.12140v2
Date: Tue, 30 Sep 2025 15:57:47 GMT
Title: Worker Discretion Advised: Co-designing Risk Disclosure in Crowdsourced Responsible AI (RAI) Content Work
Authors: Alice Qian, Ziqi Yang, Ryland Shaw, Jina Suh, Laura Dabbish, Hong Shen,
Abstract summary: Responsible AI (RAI) content work often exposes crowd workers to potentially harmful content.<n>We conduct co-design sessions with 29 task designers, workers, and platform representatives.<n>We identify design tensions and map the sociotechnical tradeoffs that shape disclosure practices.
Score: 12.492380198885295
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Responsible AI (RAI) content work, such as annotation, moderation, or red teaming for AI safety, often exposes crowd workers to potentially harmful content. While prior work has underscored the importance of communicating well-being risk to employed content moderators, designing effective disclosure mechanisms for crowd workers while balancing worker protection with the needs of task designers and platforms remains largely unexamined. To address this gap, we conducted co-design sessions with 29 task designers, workers, and platform representatives. We investigated task designer preferences for support in disclosing tasks, worker preferences for receiving risk disclosure warnings, and how platform stakeholders envision their role in shaping risk disclosure practices. We identify design tensions and map the sociotechnical tradeoffs that shape disclosure practices. We contribute design recommendations and feature concepts for risk disclosure mechanisms in the context of RAI content work.

Related papers

Capability-Oriented Training Induced Alignment Risk [101.37328448441208]
We investigate whether language models, when trained with reinforcement learning, will spontaneously learn to exploit flaws to maximize their reward.<n>Our experiments show that models consistently learn to exploit these vulnerabilities, discovering opportunistic strategies that significantly increase their reward at the expense of task correctness or safety.<n>Our findings suggest that future AI safety work must extend beyond content moderation to rigorously auditing and securing the training environments and reward mechanisms themselves.
arXiv Detail & Related papers (2026-02-12T16:13:14Z)
Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models [92.8572422396691]
Constructive Safety Alignment (CSA) protects against malicious misuse while actively guiding vulnerable users toward safe and helpful results.<n>Oy1 achieves state-of-the-art safety among open models while retaining high general capabilities.<n>We release Oy1, code, and the benchmark to support responsible, user-centered AI.
arXiv Detail & Related papers (2025-09-02T03:04:27Z)
Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work [8.740145195086205]
Crowd workers are often tasked with responsible AI (RAI) content work.<n>While prior efforts have highlighted the risks to worker well-being associated with RAI content work, far less attention has been paid to how these risks are communicated to workers.<n>This study investigates how task designers approach risk disclosure in crowdsourced RAI tasks.
arXiv Detail & Related papers (2025-05-30T06:08:50Z)
LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z)
AURA: Amplifying Understanding, Resilience, and Awareness for Responsible AI Content Work [9.15754890995565]
This study investigates the nature and challenges of content work that supports responsible AI (RAI) efforts. We develop a conceptualization of RAI content work and a framework of recommendations for providing holistic support for content workers. We discuss how our framework may guide future innovation to support the well-being and professional development of the RAI content workforce.
arXiv Detail & Related papers (2024-11-03T03:27:02Z)
Risks and NLP Design: A Case Study on Procedural Document QA [52.557503571760215]
We argue that clearer assessments of risks and harms to users will be possible when we specialize the analysis to more concrete applications and their plausible users. We conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.
arXiv Detail & Related papers (2024-08-16T17:23:43Z)
Rideshare Transparency: Translating Gig Worker Insights on AI Platform Design to Policy [8.936861276568006]
We identify transparency-related harms, mitigation strategies, and worker needs.<n>We use a novel mixed-methods study combining an LLM-based analysis of over 1 million comments posted to online platform worker communities.<n>We argue that new regulations requiring platforms to publish public transparency reports may be a more effective solution to improve worker well-being.
arXiv Detail & Related papers (2024-06-16T00:46:49Z)
Designing Sousveillance Tools for Gig Workers [10.31597350024712]
As independently-contracted employees, gig workers disproportionately suffer the consequences of workplace surveillance. Some critical theorists have proposed sousveillance as a potential means of countering such abuses of power. We conducted semi-structured interviews and led co-design activities with gig workers. We identify gig workers' attitudes towards and past experiences with sousveillance.
arXiv Detail & Related papers (2024-03-15T03:08:26Z)
On the Societal Impact of Open Foundation Models [93.67389739906561]
We focus on open foundation models, defined here as those with broadly available model weights. We identify five distinctive properties of open foundation models that lead to both their benefits and risks.
arXiv Detail & Related papers (2024-02-27T16:49:53Z)
Red-Teaming for Generative AI: Silver Bullet or Security Theater? [42.35800543892003]
We argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.
arXiv Detail & Related papers (2024-01-29T05:46:14Z)
Justice in interaction design: preventing manipulation in interfaces [0.5524804393257919]
Designers incorporate values in the design process that raise risks for vulnerable groups. Persuading in user interfaces can quickly turn into manipulation and become potentially harmful for those groups in the realm of intellectual disabilities, class, or health. Here we explain how it can be used proactively to inform designers' decisions when it comes to evaluating justice in their designs preventing the risk of manipulation.
arXiv Detail & Related papers (2022-04-14T08:45:06Z)
Overcoming Failures of Imagination in AI Infused System Development and Deployment [71.9309995623067]
NeurIPS 2020 requested that research paper submissions include impact statements on "potential nefarious uses and the consequences of failure" We argue that frameworks of harms must be context-aware and consider a wider range of potential stakeholders, system affordances, as well as viable proxies for assessing harms in the widest sense.
arXiv Detail & Related papers (2020-11-26T18:09:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.