TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
Questions
- URL: http://arxiv.org/abs/2303.08039v1
- Date: Thu, 9 Mar 2023 10:55:48 GMT
- Title: TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
Questions
- Authors: He Zhu, Xihua Li, Xuemin Zhao, Yunbo Cao, Shan Yu
- Abstract summary: Test questions (TQ) are usually heterogeneous and multi-modal, e.g., some of them may only contain text, while others half contain images with information beyond their literal description.
In this paper, we first improve previous text-only representation with a two-stage unsupervised instance level contrastive based pre-training method.
Then, TQ-Net was proposed to fuse the content of images to the representation of heterogeneous data.
- Score: 18.186909839033017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, more and more people study online for the convenience of access to
massive learning materials (e.g. test questions/notes), thus accurately
understanding learning materials became a crucial issue, which is essential for
many educational applications. Previous studies focus on using language models
to represent the question data. However, test questions (TQ) are usually
heterogeneous and multi-modal, e.g., some of them may only contain text, while
others half contain images with information beyond their literal description.
In this context, both supervised and unsupervised methods are difficult to
learn a fused representation of questions. Meanwhile, this problem cannot be
solved by conventional methods such as image caption, as the images may contain
information complementary rather than duplicate to the text. In this paper, we
first improve previous text-only representation with a two-stage unsupervised
instance level contrastive based pre-training method (MCL: Mixture Unsupervised
Contrastive Learning). Then, TQ-Net was proposed to fuse the content of images
to the representation of heterogeneous data. Finally, supervised contrastive
learning was conducted on relevance prediction-related downstream tasks, which
helped the model to learn the representation of questions effectively. We
conducted extensive experiments on question-based tasks on large-scale,
real-world datasets, which demonstrated the effectiveness of TQ-Net and improve
the precision of downstream applications (e.g. similar questions +2.02% and
knowledge point prediction +7.20%). Our code will be available, and we will
open-source a subset of our data to promote the development of relative
studies.
Related papers
- Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning [12.377363857246602]
We propose a novel model named MI-DELIGHT for short text classification.
It first performs multi-source information exploration to alleviate the sparsity issues.
Then, the graph learning approach is adopted to learn the representation of short texts.
arXiv Detail & Related papers (2025-01-16T00:26:15Z) - SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.
We employ automatic and manual curation to create the dataset.
SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Learning Downstream Task by Selectively Capturing Complementary
Knowledge from Multiple Self-supervisedly Learning Pretexts [20.764378638979704]
We propose a novel solution by leveraging the attention mechanism to adaptively squeeze suitable representations for the tasks.
Our scheme significantly exceeds current popular pretext-matching based methods in gathering knowledge.
arXiv Detail & Related papers (2022-04-11T16:46:50Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
Knowledge Extraction and Grounding [131.8797942031366]
We present a new QA evaluation benchmark with 1,384 questions over news articles that require cross-media grounding of objects in images onto text.
Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.
We introduce a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task.
arXiv Detail & Related papers (2021-12-20T18:23:30Z) - Continual Learning for Blind Image Quality Assessment [80.55119990128419]
Blind image quality assessment (BIQA) models fail to continually adapt to subpopulation shift.
Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets.
We formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets.
arXiv Detail & Related papers (2021-02-19T03:07:01Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.