Related papers: MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents

MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents

URL: http://arxiv.org/abs/2502.05887v1
Date: Sun, 09 Feb 2025 13:00:53 GMT
Title: MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Authors: Wanqi Yang, Yanda Li, Meng Fang, Ling Chen,
Abstract summary: MTPChat is a time-aware persona dialogue dataset that integrates linguistic, visual, and temporal elements within dialogue and persona memory.<n>We propose two time-sensitive tasks: Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP)<n>We present an innovative framework featuring an adaptive temporal module to effectively integrate multimodal streams and capture temporal dependencies.
Score: 23.98067169669452
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding temporal dynamics is critical for conversational agents, enabling effective content analysis and informed decision-making. However, time-aware datasets, particularly for persona-grounded conversations, are still limited, which narrows their scope and diminishes their complexity. To address this gap, we introduce MTPChat, a multimodal, time-aware persona dialogue dataset that integrates linguistic, visual, and temporal elements within dialogue and persona memory. Leveraging MTPChat, we propose two time-sensitive tasks: Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP), both designed to assess a model's ability to understand implicit temporal cues and dynamic interactions. Additionally, we present an innovative framework featuring an adaptive temporal module to effectively integrate multimodal streams and capture temporal dependencies. Experimental results validate the challenges posed by MTPChat and demonstrate the effectiveness of our framework in multimodal time-sensitive scenarios.

Related papers

From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents [26.437011114518917]
TimelyChat benchmark evaluates the capabilities of language models to predict appropriate time intervals and generate time-conditioned responses.<n>We construct a large-scale training dataset by leveraging unlabeled event knowledge from a temporal commonsense knowledge graph.<n>We then train Timer, a dialogue agent designed to proactively predict time intervals and generate timely responses that align with those intervals.
arXiv Detail & Related papers (2025-06-17T07:56:32Z)
Beyond Words: Multimodal LLM Knows When to Speak [25.374878759869333]
We focus on real-time prediction of response types, with an emphasis on short, reactive utterances that depend on subtle, multimodal signals across vision, audio, and text.<n>We introduce a new multimodal dataset constructed from real-world conversational videos, containing temporally aligned visual, auditory, and textual streams.<n>We propose MM-When2Speak, a multimodal LLM-based model that adaptively integrates visual, auditory, and textual context to predict when a response should occur, and what type of response is appropriate.
arXiv Detail & Related papers (2025-05-20T17:42:34Z)
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities [93.09944267871163]
FullDuplexBench is a benchmark that systematically evaluates key conversational behaviors. We aim to advance spoken dialogue modeling and encourage the development of more interactive and natural dialogue systems.
arXiv Detail & Related papers (2025-03-06T18:59:16Z)
TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues [13.638344516302851]
Temporal reasoning in multi-session dialogues presents a significant challenge which has been under-studied.<n>We introduce an approach to construct a new benchmark by augmenting dialogues from LoCoMo and creating multi-choice QAs.<n>We also present TReMu, a new framework aimed at enhancing the temporal reasoning capabilities of LLM-agents.
arXiv Detail & Related papers (2025-02-03T18:58:19Z)
TempoGPT: Enhancing Temporal Reasoning via Quantizing Embedding [13.996105878417204]
We propose a multi-modal time series data construction approach and a multi-modal time series language model (TLM), TempoGPT.<n>We construct multi-modal data for complex reasoning tasks by analyzing the variable-system relationships within a white-box system.<n>Extensive experiments demonstrate that TempoGPT accurately perceives temporal information, logically infers conclusions, and achieves state-of-the-art in the constructed complex time series reasoning tasks.
arXiv Detail & Related papers (2025-01-13T13:47:05Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent) It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues. We equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z)
On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment. We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z)
DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval. Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP. To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z)
Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction [2.1261712640167856]
Emotion recognition is a crucial task for human conversation understanding. We propose an input Temporal Graph Neural Network with Cross-Modality Interaction (CORECT) CORECT effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies.
arXiv Detail & Related papers (2023-11-08T07:46:25Z)
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions [33.67477398036821]
We present Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes.
arXiv Detail & Related papers (2021-09-20T12:45:04Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.