Related papers: BlendX: Complex Multi-Intent Detection with Blended Patterns

BlendX: Complex Multi-Intent Detection with Blended Patterns

URL: http://arxiv.org/abs/2403.18277v1
Date: Wed, 27 Mar 2024 06:13:04 GMT
Title: BlendX: Complex Multi-Intent Detection with Blended Patterns
Authors: Yejin Yoon, Jungyeon Lee, Kangsan Kim, Chanhee Park, Taeuk Kim,
Abstract summary: We present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors. For dataset construction, we utilize both rule-baseds and a generative tool -- OpenAI's ChatGPT -- which is augmented with a similarity-driven strategy for utterance selection. Experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets.
Score: 4.852816974803059
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSNIPS have limitations in their formulation. To address these issues, we present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors, elevating both its complexity and diversity. For dataset construction, we utilize both rule-based heuristics as well as a generative tool -- OpenAI's ChatGPT -- which is augmented with a similarity-driven strategy for utterance selection. To ensure the quality of the proposed datasets, we also introduce three novel metrics that assess the statistical properties of an utterance related to word count, conjunction use, and pronoun usage. Extensive experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets, highlighting the need to reexamine the current state of the MID field. The dataset is available at https://github.com/HYU-NLP/BlendX.

Related papers

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing. It is designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
MINIMA: Modality Invariant Image Matching [52.505282811925454]
We present MINIMA, a unified image matching framework for multiple cross-modal cases. We scale up the modalities from cheap but rich RGB-only matching data, by means of generative models. With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability.
arXiv Detail & Related papers (2024-12-27T02:39:50Z)
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection. We introduce multi-stage prompting modules for multi-dataset 3D detection. Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z)
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data. We introduce MMTabQA, a new dataset designed for this purpose. Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z)
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers [88.72203975896558]
It is crucial to ensure that knowledge gained from multimodal sensing is effectively shared. Similar samples across different modalities have more knowledge to share than otherwise. We propose a method for RGB-X tracker during inference, with an average +3% precision improvement over the current SOTA.
arXiv Detail & Related papers (2024-05-28T03:00:58Z)
Zero-Shot Stance Detection using Contextual Data Generation with LLMs [0.04096453902709291]
We propose Dynamic Model Adaptation with Contextual Data Generation (DyMoAdapt) In this approach, we aim to fine-tune an existing model at test time. We achieve this by generating new topic-specific data using GPT-3. This method could enhance performance by allowing the adaptation of the model to new topics.
arXiv Detail & Related papers (2024-05-19T17:58:26Z)
All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO) AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning. Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z)
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets [34.780870585656395]
We propose dataset-Aware Mixture-of-Experts, DAMEX. We train the experts to become an expert' of a dataset by learning to route each dataset tokens to its mapped expert. Experiments on Universal Object-Detection Benchmark show that we outperform the existing state-of-the-art.
arXiv Detail & Related papers (2023-11-08T18:55:24Z)
Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph. We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z)
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text [58.655375327681774]
We propose the first Multimodal Retrieval-Augmented Transformer (MuRAG) MuRAG accesses an external non-parametric multimodal memory to augment language generation. Our results show that MuRAG achieves state-of-the-art accuracy, outperforming existing models by 10-20% absolute on both datasets.
arXiv Detail & Related papers (2022-10-06T13:58:03Z)
MIntRec: A New Dataset for Multimodal Intent Recognition [18.45381778273715]
Multimodal intent recognition is a significant task for understanding human language in real-world multimodal scenes. This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue. It formulates coarse-grained and fine-grained intent based on the data collected from the TV series Superstore.
arXiv Detail & Related papers (2022-09-09T15:37:39Z)
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit [6.187270874122921]
We propose a toolkit for systematic multimodal VAE training and comparison. We present a disentangled bimodal dataset designed to comprehensively evaluate the joint generation and cross-generation capabilities.
arXiv Detail & Related papers (2022-09-07T10:26:28Z)
Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT. Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations. Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.