Related papers: Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

URL: http://arxiv.org/abs/2511.05532v1
Date: Wed, 29 Oct 2025 09:11:20 GMT
Title: Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning
Authors: Rufan Zhang, Lin Zhang, Xianghang Mi,
Abstract summary: We propose a novel framework that unifies the detection of toxicity, spam, and negative sentiment across binary, multi-class, and multi-label settings.<n>Our approach enables lightweight personalization, allowing users to easily block new categories, unblock existing ones, or extend detection to semantic variations.
Score: 4.559454504442884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of harmful online content--e.g., toxicity, spam, and negative sentiment--demands robust and adaptable moderation systems. However, prevailing moderation systems are centralized and task-specific, offering limited transparency and neglecting diverse user preferences--an approach ill-suited for privacy-sensitive or decentralized environments. We propose a novel framework that leverages in-context learning (ICL) with foundation models to unify the detection of toxicity, spam, and negative sentiment across binary, multi-class, and multi-label settings. Crucially, our approach enables lightweight personalization, allowing users to easily block new categories, unblock existing ones, or extend detection to semantic variations through simple prompt-based interventions--all without model retraining. Extensive experiments on public benchmarks (TextDetox, UCI SMS, SST2) and a new, annotated Mastodon dataset reveal that: (i) foundation models achieve strong cross-task generalization, often matching or surpassing task-specific fine-tuned models; (ii) effective personalization is achievable with as few as one user-provided example or definition; and (iii) augmenting prompts with label definitions or rationales significantly enhances robustness to noisy, real-world data. Our work demonstrates a definitive shift beyond one-size-fits-all moderation, establishing ICL as a practical, privacy-preserving, and highly adaptable pathway for the next generation of user-centric content safety systems. To foster reproducibility and facilitate future research, we publicly release our code on GitHub and the annotated Mastodon dataset on Hugging Face.

Related papers

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection [28.951637174740203]
Existing methods often struggle to capture intricate feature interactions and adapt across diverse application scenarios.<n>We introduce a novel framework that integrates permutation-invariant embedding with policy-guided search.<n>In practice, data across local clients is highly imbalanced, heterogeneous and constrained by strict privacy regulations.
arXiv Detail & Related papers (2025-10-07T02:53:32Z)
Personalized Vision via Visual In-Context Learning [62.85784251383279]
We present a visual in-context learning framework for personalized vision.<n>PICO infers the underlying transformation and applies it to new inputs without retraining.<n>We also propose an attention-guided seed scorer that improves reliability via efficient inference scaling.
arXiv Detail & Related papers (2025-09-29T17:58:45Z)
What Makes You Unique? Attribute Prompt Composition for Object Re-Identification [70.67907354506278]
Object Re-IDentification aims to recognize individuals across non-overlapping camera views.<n>Single-domain models tend to overfit to domain-specific features, whereas cross-domain models often rely on diverse normalization strategies.<n>We propose an Attribute Prompt Composition framework, which exploits textual semantics to jointly enhance discrimination and generalization.
arXiv Detail & Related papers (2025-09-23T07:03:08Z)
RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z)
Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation [18.762185355073008]
We propose a novel model (LaD) that captures personalized information from both long-term and short-term interests.<n>In LaD, personalized information is captured hierarchically at both coarse-grained and fine-grained levels.<n>Our model has been deployed on Kuaishou search, driving the primary traffic for hundreds of millions of active users.
arXiv Detail & Related papers (2025-05-27T09:58:42Z)
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning [58.16354555208417]
PAD and FFD are proposed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes, respectively.<n>The lack of a Unified Face Attack Detection model to simultaneously handle attacks in these two categories is mainly attributed to two factors.<n>We present a novel Visual-Language Model-based Hierarchical Prompt Tuning Framework that adaptively explores multiple classification criteria from different semantic spaces.
arXiv Detail & Related papers (2025-05-19T16:35:45Z)
Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification [29.74457390987092]
We propose a novel framework to identify and regularize unintended features in large language models (LLMs) latent spaces.<n>We evaluate the proposed framework on three real-world tasks, including toxic chat detection, reward modeling, and disease diagnosis.
arXiv Detail & Related papers (2025-02-19T22:27:59Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
ToVo: Toxicity Taxonomy via Voting [25.22398575368979]
We propose a dataset creation mechanism that integrates voting and chain-of-thought processes.<n>Our methodology ensures diverse classification metrics for each sample.<n>We utilize the dataset created through our proposed mechanism to train our model.
arXiv Detail & Related papers (2024-06-21T02:35:30Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.