Designing Multimodal Datasets for NLP Challenges
- URL: http://arxiv.org/abs/2105.05999v1
- Date: Wed, 12 May 2021 23:02:46 GMT
- Title: Designing Multimodal Datasets for NLP Challenges
- Authors: James Pustejovsky, Eben Holderness, Jingxuan Tu, Parker Glenn,
Kyeongmin Rim, Kelley Lynch, Richard Brutti
- Abstract summary: We identify challenges and tasks that are reflective of linguistic and cognitive competencies that humans have when speaking and reasoning.
We describe a diagnostic dataset, Recipe-to-Video Questions (R2VQ), designed for testing competence-based comprehension over a multimodal recipe collection.
- Score: 5.874143210792986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we argue that the design and development of multimodal
datasets for natural language processing (NLP) challenges should be enhanced in
two significant respects: to more broadly represent commonsense semantic
inferences; and to better reflect the dynamics of actions and events, through a
substantive alignment of textual and visual information. We identify challenges
and tasks that are reflective of linguistic and cognitive competencies that
humans have when speaking and reasoning, rather than merely the performance of
systems on isolated tasks. We introduce the distinction between challenge-based
tasks and competence-based performance, and describe a diagnostic dataset,
Recipe-to-Video Questions (R2VQ), designed for testing competence-based
comprehension over a multimodal recipe collection (http://r2vq.org/). The
corpus contains detailed annotation supporting such inferencing tasks and
facilitating a rich set of question families that we use to evaluate NLP
systems.
Related papers
- VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks [48.67062958311173]
VL-GLUE is a multitask benchmark for natural language understanding.
We show that this benchmark is quite challenging for existing large-scale vision-language models.
arXiv Detail & Related papers (2024-10-17T15:27:17Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - A Review of Hybrid and Ensemble in Deep Learning for Natural Language Processing [0.5266869303483376]
Review systematically introduces each task, delineates key architectures from Recurrent Neural Networks (RNNs) to Transformer-based models like BERT.
The adaptability of ensemble techniques is emphasized, highlighting their capacity to enhance various NLP applications.
Challenges in implementation, including computational overhead, overfitting, and model interpretation complexities, are addressed.
arXiv Detail & Related papers (2023-12-09T14:49:34Z) - Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language
Pretraining? [34.609984453754656]
We aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment.
Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark.
arXiv Detail & Related papers (2023-08-24T16:17:40Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - MIMIC-IT: Multi-Modal In-Context Instruction Tuning [44.879418596312554]
We present a dataset comprising 2.8 million multimodal instruction-response pairs, with 2.2 million unique instructions derived from images and videos.
Using the MIMIC-IT dataset, it has been observed that Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning.
We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.
arXiv Detail & Related papers (2023-06-08T17:59:56Z) - Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2.
It acquires the context related attribute and relation knowledge from the knowledge base.
We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.