Personal Attribute Prediction from Conversations
- URL: http://arxiv.org/abs/2209.09619v1
- Date: Mon, 29 Aug 2022 15:21:53 GMT
- Title: Personal Attribute Prediction from Conversations
- Authors: Yinan Liu and Hu Chen and Wei Shen
- Abstract summary: We aim to predict the personal attribute value for the user, which is helpful for the enrichment of personal knowledge bases (PKBs)
We propose a framework based on the pre-trained language model with a noise-robust loss function to predict personal attributes from conversations without requiring any labeled utterances.
Our framework obtains the best performance compared with all the twelve baselines in terms of nDCG and MRR.
- Score: 9.208339833472051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personal knowledge bases (PKBs) are critical to many applications, such as
Web-based chatbots and personalized recommendation. Conversations containing
rich personal knowledge can be regarded as a main source to populate the PKB.
Given a user, a user attribute, and user utterances from a conversational
system, we aim to predict the personal attribute value for the user, which is
helpful for the enrichment of PKBs. However, there are three issues existing in
previous studies: (1) manually labeled utterances are required for model
training; (2) personal attribute knowledge embedded in both utterances and
external resources is underutilized; (3) the performance on predicting some
difficult personal attributes is unsatisfactory. In this paper, we propose a
framework DSCGN based on the pre-trained language model with a noise-robust
loss function to predict personal attributes from conversations without
requiring any labeled utterances. We yield two categories of supervision, i.e.,
document-level supervision via a distant supervision strategy and
contextualized word-level supervision via a label guessing method, by mining
the personal attribute knowledge embedded in both unlabeled utterances and
external resources to fine-tune the language model. Extensive experiments over
two real-world data sets (i.e., a profession data set and a hobby data set)
show our framework obtains the best performance compared with all the twelve
baselines in terms of nDCG and MRR.
Related papers
- PersoBench: Benchmarking Personalized Response Generation in Large Language Models [6.8046587254152735]
We present a new benchmark, PersoBench, to evaluate the personalization ability of large language models (LLMs) in persona-aware dialogue generation.
Our analysis, conducted on three well-known persona-aware datasets, evaluates multiple dimensions of response quality, including fluency, diversity, coherence, and personalization.
arXiv Detail & Related papers (2024-10-04T07:29:41Z) - OPSD: an Offensive Persian Social media Dataset and its baseline evaluations [2.356562319390226]
This paper introduces two offensive datasets for Persian language.
The first dataset comprises annotations provided by domain experts, while the second consists of a large collection of unlabeled data obtained through web crawling.
The obtained F1-scores for the three-class and two-class versions of the dataset were 76.9% and 89.9% for XLM-RoBERTa, respectively.
arXiv Detail & Related papers (2024-04-08T14:08:56Z) - QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value.
We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z) - AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - Speaker Profiling in Multiparty Conversations [31.518453682472575]
This research paper explores the task of Speaker Profiling in Conversations (SPC)
The primary objective of SPC is to produce a summary of persona characteristics for each individual speaker present in a dialogue.
To address the task of SPC, we have curated a new dataset named SPICE, which comes with specific labels.
arXiv Detail & Related papers (2023-04-18T08:04:46Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Low-resource Personal Attribute Prediction from Conversation [20.873276038560057]
We propose a novel framework PEARL to predict personal attributes from conversations.
PEARL combines the biterm semantic information with the word co-occurrence information seamlessly via employing the updated prior attribute knowledge.
arXiv Detail & Related papers (2022-11-28T14:04:51Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Improving Personality Consistency in Conversation by Persona Extending [22.124187337032946]
We propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, Persona Retrieval Model (PRM) and Posterior-scored Transformer (PS-Transformer)
Our proposed model yields considerable improvements in both automatic metrics and human evaluations.
arXiv Detail & Related papers (2022-08-23T09:00:58Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z) - A Neural Topical Expansion Framework for Unstructured Persona-oriented
Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content.
PEE consists of two main modules: persona exploration and persona exploitation.
Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.