Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values
behind Arguments by Leveraging Their Definitions
- URL: http://arxiv.org/abs/2302.13925v2
- Date: Thu, 18 May 2023 20:43:55 GMT
- Title: Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values
behind Arguments by Leveraging Their Definitions
- Authors: Christian Fang, Qixiang Fang, Dong Nguyen
- Abstract summary: We describe our experiments for SemEval-2023 Task 4 on the identification of human values behind arguments.
Because human values are subjective concepts which require precise definitions, we hypothesize that incorporating the definitions of human values during model training can yield better prediction performance.
- Score: 5.343406649012618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe our experiments for SemEval-2023 Task 4 on the identification of
human values behind arguments (ValueEval). Because human values are subjective
concepts which require precise definitions, we hypothesize that incorporating
the definitions of human values (in the form of annotation instructions and
validated survey items) during model training can yield better prediction
performance. We explore this idea and show that our proposed models perform
better than the challenge organizers' baselines, with improvements in macro F1
scores of up to 18%.
Related papers
- Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - It HAS to be Subjective: Human Annotator Simulation via Zero-shot
Density Estimation [15.8765167340819]
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.
Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations.
This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem.
arXiv Detail & Related papers (2023-09-30T20:54:59Z) - SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation [78.23119125463964]
We develop SocREval, a novel approach for prompt design in reference-free reasoning evaluation.
SocREval significantly improves GPT-4's performance, surpassing existing reference-free and reference-based reasoning evaluation metrics.
arXiv Detail & Related papers (2023-09-29T18:25:46Z) - Human Feedback is not Gold Standard [28.63384327791185]
We critically analyse the use of human feedback for both training and evaluation.
We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality.
arXiv Detail & Related papers (2023-09-28T11:18:20Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for
Identifying Human Values from Arguments [0.0]
We present an ensemble approach for detecting human values from argument text.
Our ensemble comprises three models: (i) An entailment-based model for determining the human values based on their descriptions, (ii) A Roberta-based classifier that predicts the set of human values from an argument.
arXiv Detail & Related papers (2023-05-09T10:54:34Z) - Are Human Explanations Always Helpful? Towards Objective Evaluation of
Human Natural Language Explanations [27.624182544486334]
We build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness.
We define a new metric that can take into consideration the helpfulness of an explanation for model performance.
arXiv Detail & Related papers (2023-05-04T19:31:50Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Revisiting the Gold Standard: Grounding Summarization Evaluation with
Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale.
We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units.
We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.