A Multi-Size Neural Network with Attention Mechanism for Answer
Selection
- URL: http://arxiv.org/abs/2105.03278v1
- Date: Sat, 24 Apr 2021 02:13:26 GMT
- Title: A Multi-Size Neural Network with Attention Mechanism for Answer
Selection
- Authors: Jie Huang
- Abstract summary: An effective architecture,multi-size neural network with attention mechanism (AM-MSNN),is introduced into the answer selection task.
It captures more levels of language granularities in parallel, because of the various sizes of filters comparing with single-layer CNN and multi-layer CNNs.
It extends the sentence representations by attention mechanism, thus containing more information for different types of questions.
- Score: 3.310455595316906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic matching is of central significance to the answer selection task
which aims to select correct answers for a given question from a candidate
answer pool. A useful method is to employ neural networks with attention to
generate sentences representations in a way that information from pair
sentences can mutually influence the computation of representations. In this
work, an effective architecture,multi-size neural network with attention
mechanism (AM-MSNN),is introduced into the answer selection task. This
architecture captures more levels of language granularities in parallel,
because of the various sizes of filters comparing with single-layer CNN and
multi-layer CNNs. Meanwhile it extends the sentence representations by
attention mechanism, thus containing more information for different types of
questions. The empirical study on three various benchmark tasks of answer
selection demonstrates the efficacy of the proposed model in all the benchmarks
and its superiority over competitors. The experimental results show that (1)
multi-size neural network (MSNN) is a more useful method to capture abstract
features on different levels of granularities than single/multi-layer CNNs; (2)
the attention mechanism (AM) is a better strategy to derive more informative
representations; (3) AM-MSNN is a better architecture for the answer selection
task for the moment.
Related papers
- Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - Effective Subset Selection Through The Lens of Neural Network Pruning [31.43307762723943]
It is important to select the data to be annotated wisely, which is known as the subset selection problem.
We investigate the relationship between subset selection and neural network pruning, which is more widely studied.
We propose utilizing the norm criterion of neural network features to improve subset selection methods.
arXiv Detail & Related papers (2024-06-03T08:12:32Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - SECNN: Squeeze-and-Excitation Convolutional Neural Network for Sentence
Classification [0.0]
Convolution neural network (CNN) has the ability to extract n-grams features through convolutional filters.
We propose a Squeeze-and-Excitation Convolutional neural Network (SECNN) for sentence classification.
arXiv Detail & Related papers (2023-12-11T03:26:36Z) - Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks [19.639533220155965]
This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection.
We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic.
arXiv Detail & Related papers (2023-11-18T02:44:33Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Classification of diffraction patterns using a convolutional neural
network in single particle imaging experiments performed at X-ray
free-electron lasers [53.65540150901678]
Single particle imaging (SPI) at X-ray free electron lasers (XFELs) is particularly well suited to determine the 3D structure of particles in their native environment.
For a successful reconstruction, diffraction patterns originating from a single hit must be isolated from a large number of acquired patterns.
We propose to formulate this task as an image classification problem and solve it using convolutional neural network (CNN) architectures.
arXiv Detail & Related papers (2021-12-16T17:03:14Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Multichannel CNN with Attention for Text Classification [5.1545224296246275]
This paper proposes Attention-based Multichannel Convolutional Neural Network (AMCNN) for text classification.
AMCNN uses a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations.
The experimental results on the benchmark datasets demonstrate that AMCNN achieves better performance than state-of-the-art methods.
arXiv Detail & Related papers (2020-06-29T16:37:51Z) - Analyzing Neural Networks Based on Random Graphs [77.34726150561087]
We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types.
We find that none of the classical numerical graph invariants by itself allows to single out the best networks.
We also find that networks with primarily short-range connections perform better than networks which allow for many long-range connections.
arXiv Detail & Related papers (2020-02-19T11:04:49Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.