Related papers: Contextualized Embeddings based Convolutional Neural Networks for Duplicate Question Identification

Contextualized Embeddings based Convolutional Neural Networks for Duplicate Question Identification

URL: http://arxiv.org/abs/2109.01560v2
Date: Mon, 6 Sep 2021 14:38:41 GMT
Title: Contextualized Embeddings based Convolutional Neural Networks for Duplicate Question Identification
Authors: Harsh Sakhrani, Saloni Parekh and Pratik Ratadiya
Abstract summary: Question Paraphrase Identification (QPI) is a critical task for large-scale Question-Answering forums. We propose a novel architecture combining a Bidirectional Transformer with Convolutional Neural Networks for the QPI task. Experimental results demonstrate that our model achieves state-of-the-art performance on the Quora Question Pairs dataset.
Score: 0.25782420501870296
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Question Paraphrase Identification (QPI) is a critical task for large-scale Question-Answering forums. The purpose of QPI is to determine whether a given pair of questions are semantically identical or not. Previous approaches for this task have yielded promising results, but have often relied on complex recurrence mechanisms that are expensive and time-consuming in nature. In this paper, we propose a novel architecture combining a Bidirectional Transformer Encoder with Convolutional Neural Networks for the QPI task. We produce the predictions from the proposed architecture using two different inference setups: Siamese and Matched Aggregation. Experimental results demonstrate that our model achieves state-of-the-art performance on the Quora Question Pairs dataset. We empirically prove that the addition of convolution layers to the model architecture improves the results in both inference setups. We also investigate the impact of partial and complete fine-tuning and analyze the trade-off between computational power and accuracy in the process. Based on the obtained results, we conclude that the Matched-Aggregation setup consistently outperforms the Siamese setup. Our work provides insights into what architecture combinations and setups are likely to produce better results for the QPI task.

Related papers

seqKAN: Sequence processing with Kolmogorov-Arnold Networks [0.0]
Kolmogorov-Arnold Networks (KANs) have been recently proposed as a machine learning framework that is more interpretable and controllable than the multi-layer perceptron. This paper proposes seqKAN, a new KAN architecture for sequence processing.
arXiv Detail & Related papers (2025-02-20T16:10:18Z)
Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions [1.2425910171551517]
Question-answering (QA) is an important application of Information Retrieval (IR) and language models. We propose an innovative approach to improve QA task performances by integrating optimized vector retrievals and instruction methodologies.
arXiv Detail & Related papers (2024-11-01T21:14:04Z)
On Characterizing the Evolution of Embedding Space of Neural Networks using Algebraic Topology [9.537910170141467]
We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value.
arXiv Detail & Related papers (2023-11-08T10:45:12Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Impact of PolSAR pre-processing and balancing methods on complex-valued neural networks segmentation tasks [9.6556424340252]
We investigate the semantic segmentation of Polarimetric Synthetic Aperture Radar (PolSAR) using Complex-Valued Neural Network (CVNN) We exhaustively compare both methods for six model architectures, three complex-valued, and their respective real-equivalent models. We propose two methods for reducing this gap and performing the results for all input representations, models, and dataset pre-processing.
arXiv Detail & Related papers (2022-10-28T12:49:43Z)
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z)
Neural Ensemble Search for Uncertainty Estimation and Dataset Shift [67.57720300323928]
Ensembles of neural networks achieve superior performance compared to stand-alone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift. We propose two methods for automatically constructing ensembles with emphvarying architectures. We show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift.
arXiv Detail & Related papers (2020-06-15T17:38:15Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
Linguistically Driven Graph Capsule Network for Visual Question Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network" The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence. Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z)
Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation. We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component. Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
Anytime Inference with Distilled Hierarchical Neural Ensembles [32.003196185519]
Inference in deep neural networks can be computationally expensive, and networks capable of anytime inference are important in mscenarios where the amount of compute or quantity of input data varies over time. We propose Hierarchical Neural Ensembles (HNE), a novel framework to embed an ensemble of multiple networks in a hierarchical tree structure, sharing intermediate layers. Our experiments show that, compared to previous anytime inference models, HNE provides state-of-the-art accuracy-computate trade-offs on the CIFAR-10/100 and ImageNet datasets.
arXiv Detail & Related papers (2020-03-03T12:13:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.