Related papers: POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling

POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling

URL: http://arxiv.org/abs/2109.03039v1
Date: Tue, 7 Sep 2021 12:31:29 GMT
Title: POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling
Authors: Zeyang Liu, Ke Zhou, Jiaxin Mao, Max L. Wilson
Abstract summary: Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. We propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. We show that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
Score: 25.477834359694473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. Evaluating such systems is very challenging since search results are presented in the format of natural language sentences. Given the unlimited number of possible responses, collecting relevance assessments for all the possible responses is infeasible. In this paper, we propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. The proposed embedding-based metric takes the influence of part of speech (POS) of the terms in the response into account. To the best knowledge, our work is the first to systematically demonstrate the importance of incorporating syntactic information, such as POS labels, for conversational search evaluation. Experimental results demonstrate that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.

Related papers

Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models [8.556799193001341]
This paper explores the task of performing turn-level data augmentation for dialogue system based on different types of commonsense relationships.<n>The proposed methodology takes advantage of the extended knowledge and zero-shot capabilities of pretrained Large Language Models (LLMs) to follow instructions.<n>Preliminary results suggest that our approach effectively harnesses LLMs capabilities for commonsense reasoning and evaluation in dialogue systems.
arXiv Detail & Related papers (2025-06-24T10:18:05Z)
ProCIS: A Benchmark for Proactive Retrieval in Conversations [21.23826888841565]
We introduce a large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations. We conduct crowdsourcing experiments to obtain high-quality and relatively complete relevance judgments. We also collect annotations related to the parts of the conversation that are related to each document, enabling us to evaluate proactive retrieval systems.
arXiv Detail & Related papers (2024-05-10T13:11:07Z)
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries [48.243879779374836]
Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. We handle the task of conversation retrieval based on text summaries of the conversations. A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search.
arXiv Detail & Related papers (2024-02-20T14:31:17Z)
FCC: Fusing Conversation History and Candidate Provenance for Contextual Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z)
Improve Retrieval-based Dialogue System via Syntax-Informed Attention [46.79601705850277]
We propose SIA, Syntax-Informed Attention, considering both intra- and inter-sentence syntax information. We evaluate our method on three widely used benchmarks and experimental results demonstrate the general superiority of our method on dialogue response selection.
arXiv Detail & Related papers (2023-03-12T08:14:16Z)
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows. Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z)
Meta-evaluation of Conversational Search Evaluation Metrics [15.942419892035124]
We systematically meta-evaluate a variety of conversational search metrics. We find that METEOR is the best existing single-turn metric considering all three perspectives. We also demonstrate that adapted session-based evaluation metrics can be used to measure multi-turn conversational search.
arXiv Detail & Related papers (2021-04-27T20:01:03Z)
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system. We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z)
Topic Propagation in Conversational Search [0.0]
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages.
arXiv Detail & Related papers (2020-04-29T10:06:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.