TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
Questions
- URL: http://arxiv.org/abs/2303.08039v1
- Date: Thu, 9 Mar 2023 10:55:48 GMT
- Title: TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
Questions
- Authors: He Zhu, Xihua Li, Xuemin Zhao, Yunbo Cao, Shan Yu
- Abstract summary: Test questions (TQ) are usually heterogeneous and multi-modal, e.g., some of them may only contain text, while others half contain images with information beyond their literal description.
In this paper, we first improve previous text-only representation with a two-stage unsupervised instance level contrastive based pre-training method.
Then, TQ-Net was proposed to fuse the content of images to the representation of heterogeneous data.
- Score: 18.186909839033017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, more and more people study online for the convenience of access to
massive learning materials (e.g. test questions/notes), thus accurately
understanding learning materials became a crucial issue, which is essential for
many educational applications. Previous studies focus on using language models
to represent the question data. However, test questions (TQ) are usually
heterogeneous and multi-modal, e.g., some of them may only contain text, while
others half contain images with information beyond their literal description.
In this context, both supervised and unsupervised methods are difficult to
learn a fused representation of questions. Meanwhile, this problem cannot be
solved by conventional methods such as image caption, as the images may contain
information complementary rather than duplicate to the text. In this paper, we
first improve previous text-only representation with a two-stage unsupervised
instance level contrastive based pre-training method (MCL: Mixture Unsupervised
Contrastive Learning). Then, TQ-Net was proposed to fuse the content of images
to the representation of heterogeneous data. Finally, supervised contrastive
learning was conducted on relevance prediction-related downstream tasks, which
helped the model to learn the representation of questions effectively. We
conducted extensive experiments on question-based tasks on large-scale,
real-world datasets, which demonstrated the effectiveness of TQ-Net and improve
the precision of downstream applications (e.g. similar questions +2.02% and
knowledge point prediction +7.20%). Our code will be available, and we will
open-source a subset of our data to promote the development of relative
studies.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - QASem Parsing: Text-to-text Modeling of QA-based Semantics [19.42681342441062]
We consider three QA-based semantic tasks, namely, QA-SRL, QANom and QADiscourse.
We release the first unified QASem parsing tool, practical for downstream applications.
arXiv Detail & Related papers (2022-05-23T15:56:07Z) - Learning Downstream Task by Selectively Capturing Complementary
Knowledge from Multiple Self-supervisedly Learning Pretexts [20.764378638979704]
We propose a novel solution by leveraging the attention mechanism to adaptively squeeze suitable representations for the tasks.
Our scheme significantly exceeds current popular pretext-matching based methods in gathering knowledge.
arXiv Detail & Related papers (2022-04-11T16:46:50Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
Knowledge Extraction and Grounding [131.8797942031366]
We present a new QA evaluation benchmark with 1,384 questions over news articles that require cross-media grounding of objects in images onto text.
Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.
We introduce a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task.
arXiv Detail & Related papers (2021-12-20T18:23:30Z) - Continual Learning for Blind Image Quality Assessment [80.55119990128419]
Blind image quality assessment (BIQA) models fail to continually adapt to subpopulation shift.
Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets.
We formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets.
arXiv Detail & Related papers (2021-02-19T03:07:01Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.