TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
- URL: http://arxiv.org/abs/2405.02637v1
- Date: Sat, 4 May 2024 11:22:16 GMT
- Title: TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
- Authors: Mohammad Aliannejadi, Zahra Abbasiantaeb, Shubham Chatterjee, Jeffery Dalton, Leif Azzopardi,
- Abstract summary: The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate Conversational Search Agents (CSA)
The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas.
A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness.
- Score: 10.511277428023305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversational information seeking has evolved rapidly in the last few years with the development of Large Language Models (LLMs), providing the basis for interpreting and responding in a naturalistic manner to user requests. The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate their Conversational Search Agents (CSA). The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas. A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness. The collection challenges CSA to efficiently navigate diverse personal contexts, elicit pertinent persona information, and employ context for relevant conversations. The integration of a PTKB and the emphasis on decisional search tasks contribute to the uniqueness of this test collection, making it an essential benchmark for advancing research in conversational and interactive knowledge assistants.
Related papers
- TREC iKAT 2023: The Interactive Knowledge Assistance Track Overview [11.276981461219515]
iKAT emphasizes the creation and research of conversational search agents that adapt responses based on the user's prior interactions and present context.
Most of the runs leveraged Large Language Models (LLMs) in their pipelines, with a few focusing on a generate-then-retrieve approach.
arXiv Detail & Related papers (2024-01-02T18:40:03Z) - ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
Causal, and Discourse Relations [52.26802326949116]
We quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations.
ChatGPT exhibits exceptional proficiency in detecting and reasoning about causal relations.
It is capable of identifying the majority of discourse relations with existing explicit discourse connectives, but the implicit discourse relation remains a formidable challenge.
arXiv Detail & Related papers (2023-04-28T13:14:36Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Analysing Mixed Initiatives and Search Strategies during Conversational
Search [31.63357369175702]
We present a model for conversational search -- from which we instantiate different observed conversational search strategies, where the agent elicits: (i) Feedback-First, or (ii) Feedback-After.
Our analysis reveals that there is no superior or dominant combination, instead it shows that query clarifications are better when asked first, while query suggestions are better when asked after presenting results.
arXiv Detail & Related papers (2021-09-13T13:30:10Z) - QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z) - DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic
Dialogues [11.531187569461489]
This paper proposes the task of relation classification of interlocutors based on their dialogues.
We crawled movie scripts from IMSDb, and annotated the relation labels for each session according to 13 pre-defined relationships.
The annotated dataset DDRel consists of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126 utterances in total.
arXiv Detail & Related papers (2020-12-04T12:30:31Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z) - Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term
Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system.
We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting.
For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals.
For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z) - Topic Propagation in Conversational Search [0.0]
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions.
We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages.
arXiv Detail & Related papers (2020-04-29T10:06:00Z) - TREC CAsT 2019: The Conversational Assistance Track Overview [34.65827453762031]
The Conversational Assistance Track (CAsT) is a new track for TREC 2019 to facilitate Conversational Information Seeking (CIS) research.
The document corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets.
This year 21 groups submitted a total of 65 runs using varying methods for conversational query understanding and ranking.
arXiv Detail & Related papers (2020-03-30T16:58:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.