Related papers: Value Profiles for Encoding Human Variation

Value Profiles for Encoding Human Variation

URL: http://arxiv.org/abs/2503.15484v2
Date: Tue, 30 Sep 2025 14:52:22 GMT
Title: Value Profiles for Encoding Human Variation
Authors: Taylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel Bakker, Georgina Evans, Iason Gabriel, Noah Goodman, Verena Rieser,
Abstract summary: Value profiles are descriptions of underlying values compressed from in-context demonstrations.<n>Value profiles offer advantages in terms of scrutability, interpretability, and steerability.<n>We show that the decoder predictions change in line with semantic profile differences, are well-calibrated, and can help explain instance-level disagreement.
Score: 12.302443348395117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modelling human variation in rating tasks is crucial for personalization, pluralistic model alignment, and computational social science. We propose representing individuals using natural language value profiles -- descriptions of underlying values compressed from in-context demonstrations -- along with a steerable decoder model that estimates individual ratings from a rater representation. To measure the predictive information in a rater representation, we introduce an information-theoretic methodology and find that demonstrations contain the most information, followed by value profiles, then demographics. However, value profiles effectively compress the useful information from demonstrations (>70% information preservation) and offer advantages in terms of scrutability, interpretability, and steerability. Furthermore, clustering value profiles to identify similarly behaving individuals better explains rater variation than the most predictive demographic groupings. Going beyond test set performance, we show that the decoder predictions change in line with semantic profile differences, are well-calibrated, and can help explain instance-level disagreement by simulating an annotator population. These results demonstrate that value profiles offer novel, predictive ways to describe individual variation beyond demographics or group information.

Related papers

Reading Between the Tokens: Improving Preference Predictions through Mechanistic Forecasting [8.075670640219784]
We investigate how demographic and ideological information activates latent party-encoding components within large language models.<n>We find that leveraging this internal knowledge via mechanistic forecasting can improve prediction accuracy.
arXiv Detail & Related papers (2026-02-02T22:39:06Z)
A Framework for Personalized Persuasiveness Prediction via Context-Aware User Profiling [21.531813748944383]
Estimating the persuasiveness of messages is critical in various applications.<n>No established framework to optimize leveraging a persuadee's past activities to the benefit of a persuasiveness prediction model.<n>We propose a context-aware user profiling framework with two trainable components.
arXiv Detail & Related papers (2026-01-09T09:22:31Z)
Personalized Image Descriptions from Attention Sequences [55.65023709100682]
People can view the same image differently: they focus on different regions, objects, and details in varying orders and describe them in distinct linguistic styles.<n>Existing models for personalized image description focus on linguistic style alone, with no prior work leveraging individual viewing patterns.<n>We address this gap by explicitly modeling personalized viewing behavior as a core factor in description generation.<n>Our method, DEPER, learns a subject embedding that captures both linguistic style and viewing behavior, guided by an auxiliary attention-prediction task. A lightweight adapter aligns these embeddings with a frozen vision-language model, enabling few-shot personalization without retraining.
arXiv Detail & Related papers (2025-12-07T05:23:18Z)
Taking a SEAT: Predicting Value Interpretations from Sentiment, Emotion, Argument, and Topic Annotations [2.5617827156681625]
We investigate whether a language model can predict individual value interpretations by leveraging multi-dimensional subjective annotations as a proxy for their interpretive lens.<n>That is, we evaluate whether providing examples of how an individual annotates Sentiment, Emotion, Argument, and Topics (SEAT dimensions) helps a language model in predicting their value interpretations.<n>Our experiment across different zero- and few-shot settings demonstrates that providing all SEAT dimensions simultaneously yields superior performance compared to individual dimensions and a baseline where no information about the individual is provided.
arXiv Detail & Related papers (2025-10-02T12:51:33Z)
Individualised Counterfactual Examples Using Conformal Prediction Intervals [12.895240620484572]
High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision.<n>We explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval.<n>We present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary.<n>Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query.
arXiv Detail & Related papers (2025-05-28T13:13:52Z)
Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree [1.3749490831384268]
When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text. We study the utility of demographic information for rating prediction.
arXiv Detail & Related papers (2024-10-16T04:26:40Z)
Can Language Models Reason about Individualistic Human Values and Preferences? [44.249817353449146]
We study language models (LMs) on the challenge of individualistic value reasoning.<n>We find critical limitations in frontier LMs, which achieve only 55 % to 65% accuracy in predicting individualistic values.<n>We also identify a partiality of LMs in reasoning about global individualistic values, as measured by our proposed Value Inequity Index (sigmaInequity)
arXiv Detail & Related papers (2024-10-04T19:03:41Z)
On the Properties and Estimation of Pointwise Mutual Information Profiles [49.877314063833296]
The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables. We introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods.
arXiv Detail & Related papers (2023-10-16T10:02:24Z)
TIDE: Textual Identity Detection for Evaluating and Augmenting Classification and Language Models [0.0]
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets. We present a dataset coupled with an approach to improve text fairness in classifiers and language models. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context.
arXiv Detail & Related papers (2023-09-07T21:44:42Z)
Distribution Aware Metrics for Conditional Natural Language Generation [3.6350564275444173]
We argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse. We propose a novel paradigm for multi-candidate evaluation of conditional language generation models.
arXiv Detail & Related papers (2022-09-15T17:58:13Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Conditional Contrastive Learning: Removing Undesirable Information in Self-Supervised Representations [108.29288034509305]
We develop conditional contrastive learning to remove undesirable information in self-supervised representations. We demonstrate empirically that our methods can successfully learn self-supervised representations for downstream tasks.
arXiv Detail & Related papers (2021-06-05T10:51:26Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
On Predicting Personal Values of Social Media Users using Community-Specific Language Features and Personal Value Correlation [14.12186042953335]
This work focuses on analyzing Singapore users' personal values and developing effective models to predict their personal values using their Facebook data. We incorporate the correlations among personal values into our proposed Stack Model consisting of a task-specific layer of base models and a cross-stitch layer model.
arXiv Detail & Related papers (2020-07-16T04:36:13Z)
Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation. Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission. Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.