Related papers: Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback

Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback

URL: http://arxiv.org/abs/2510.16257v1
Date: Fri, 17 Oct 2025 23:06:21 GMT
Title: Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback
Authors: Chu Fei Luo, Samuel Dahan, Xiaodan Zhu,
Abstract summary: We aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering.<n>Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection.<n>We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.
Score: 13.065059683491958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As language models have a greater impact on society, it is important to ensure they are aligned to a diverse range of perspectives and are able to reflect nuance in human values. However, the most popular training paradigms for modern language models often assume there is one optimal answer for every query, leading to generic responses and poor alignment. In this work, we aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering. We empirically demonstrate that model steering offers consistent improvement over zero-shot and few-shot baselines with only 50 annotated samples. Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection, and improves the distributional alignment to human values in GlobalOpinionQA. We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.

Related papers

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z)
Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems [3.011820285006942]
This study proposes a new multi-perspective approach using soft labels to encourage the development of perspective aware models.<n>We conduct an analysis across diverse subjective text classification tasks, including hate speech, irony, abusive language, and stance detection.<n>Results show that the multi-perspective approach better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD)<n>Our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts.
arXiv Detail & Related papers (2025-06-25T07:53:36Z)
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability [43.984177729641615]
This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification.
arXiv Detail & Related papers (2023-07-06T17:05:26Z)
Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models. Our empirical observations suggest that vision-and-language models are better at label prediction tasks. We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z)
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? [0.06299766708197882]
We create a new task targeted at evaluating understanding of predicate-noun dependencies in a controlled setup. We evaluate a range of state-of-the-art models and find that their performance on the task varies considerably. This study highlights that targeted and controlled evaluations are a crucial step for a precise and rigorous test of the multimodal knowledge of vision-and-language models.
arXiv Detail & Related papers (2022-10-21T16:07:00Z)
A General Language Assistant as a Laboratory for Alignment [3.3598752405752106]
We study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. We study a preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences.
arXiv Detail & Related papers (2021-12-01T22:24:34Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks. For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.