Contextual Moral Value Alignment Through Context-Based Aggregation
- URL: http://arxiv.org/abs/2403.12805v1
- Date: Tue, 19 Mar 2024 15:06:53 GMT
- Title: Contextual Moral Value Alignment Through Context-Based Aggregation
- Authors: Pierre Dognin, Jesus Rios, Ronny Luss, Inkit Padhi, Matthew D Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, Djallel Bouneffouf,
- Abstract summary: We propose a system that does contextual moral value alignment based on contextual aggregation.
The proposed system shows better results in term of alignment to human value compared to the state of the art.
- Score: 34.23730699280263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art.
Related papers
- GrandJury: A Collaborative Machine Learning Model Evaluation Protocol for Dynamic Quality Rubrics [0.0]
Generative Machine Learning models have become central to modern systems, powering applications in creative writing, summarization, multi-hop reasoning, and context-aware dialogue.<n>Standard evaluation regimes still rely on static, benchmark-style tests, incentivizing optimization toward leaderboard scores rather than alignment with dynamic user needs or evolving realities.<n>GrandJury introduces a formal evaluation protocol combining time-decayed aggregation, complete traceability, with the support of dynamic, transparent task attribution, and human judgment.
arXiv Detail & Related papers (2025-08-04T22:00:44Z) - Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z) - A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z) - IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs.
Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text.
Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z) - Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A [15.86510147965235]
General Data Protection Regulation requires precise processing information to be clear and accessible.
This paper examines state-of-the-art Retrieval Generation (RAG) systems enhanced with alignment techniques to fulfill obligations.
arXiv Detail & Related papers (2025-02-10T16:42:00Z) - Democratizing Reward Design for Personal and Representative Value-Alignment [10.1630183955549]
We introduce Interactive-Reflective Dialogue Alignment, a method that iteratively engages users in reflecting on and specifying their subjective value definitions.
This system learns individual value definitions through language-model-based preference elicitation and constructs personalized reward models.
Our findings demonstrate diverse definitions of value-aligned behaviour and show that our system can accurately capture each person's unique understanding.
arXiv Detail & Related papers (2024-10-29T16:37:01Z) - CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation [0.0]
TokenSHAP is a novel method for interpreting large language models.
It adapts Shapley values from cooperative game theory to natural language processing.
It provides interpretable, quantitative measures of token importance.
arXiv Detail & Related papers (2024-07-14T08:07:50Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Emphasising Structured Information: Integrating Abstract Meaning Representation into LLMs for Enhanced Open-Domain Dialogue Evaluation [26.330012489735456]
This paper proposes an effective framework for open-domain dialogue evaluation.
It combines domain-specific language models (SLMs) enhanced with Abstract Meaning Representation (AMR) knowledge with Large Language Models (LLMs)
Experimental results on open-domain dialogue evaluation tasks demonstrate the superiority of our method compared to a wide range of state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-01T14:11:45Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.