MaP: A Matrix-based Prediction Approach to Improve Span Extraction in
Machine Reading Comprehension
- URL: http://arxiv.org/abs/2009.14348v1
- Date: Tue, 29 Sep 2020 23:53:50 GMT
- Title: MaP: A Matrix-based Prediction Approach to Improve Span Extraction in
Machine Reading Comprehension
- Authors: Huaishao Luo, Yu Shi, Ming Gong, Linjun Shou, Tianrui Li
- Abstract summary: We propose a novel approach that extends the probability vector to a probability matrix.
To each possible start index, the method always generates an end probability vector.
We evaluate our method on SQuAD 1.1 and three other question answering benchmarks.
- Score: 40.22845723686718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Span extraction is an essential problem in machine reading comprehension.
Most of the existing algorithms predict the start and end positions of an
answer span in the given corresponding context by generating two probability
vectors. In this paper, we propose a novel approach that extends the
probability vector to a probability matrix. Such a matrix can cover more
start-end position pairs. Precisely, to each possible start index, the method
always generates an end probability vector. Besides, we propose a
sampling-based training strategy to address the computational cost and memory
issue in the matrix training phase. We evaluate our method on SQuAD 1.1 and
three other question answering benchmarks. Leveraging the most competitive
models BERT and BiDAF as the backbone, our proposed approach can get consistent
improvements in all datasets, demonstrating the effectiveness of the proposed
method.
Related papers
- The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs)
It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques.
Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z) - Horseshoe-type Priors for Independent Component Estimation [0.4987670632802289]
Independent Component Estimation (ICE) has many applications in modern day machine learning.
Horseshoe-type priors are used to provide scalable algorithms.
We show how to implement conditional posteriors and envelope-based methods for optimization.
arXiv Detail & Related papers (2024-06-24T18:18:58Z) - Deep Unrolling for Nonconvex Robust Principal Component Analysis [75.32013242448151]
We design algorithms for Robust Component Analysis (A)
It consists in decomposing a matrix into the sum of a low Principaled matrix and a sparse Principaled matrix.
arXiv Detail & Related papers (2023-07-12T03:48:26Z) - Partial Matrix Completion [29.68420094716923]
This work establishes a new framework of partial matrix completion.
The goal is to identify a large subset of the entries that can be completed with high confidence.
We propose an efficient algorithm with the following provable guarantees.
arXiv Detail & Related papers (2022-08-25T12:47:20Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Probabilistic Regression with Huber Distributions [6.681943980068051]
We describe a probabilistic method for estimating the position of an object along with its covariance matrix using neural networks.
Our method is designed to be robust to outliers, have bounded gradients with respect to the network outputs, among other desirable properties.
We evaluate our method on popular body pose and facial landmark datasets and get performance on par or exceeding the performance of non-heatmap methods.
arXiv Detail & Related papers (2021-11-19T16:12:15Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Probabilistic Classification Vector Machine for Multi-Class
Classification [29.411892651468797]
The probabilistic classification vector machine (PCVM) synthesizes the advantages of both the support vector machine and the relevant vector machine.
We extend the PCVM to multi-class cases via voting strategies such as one-vs-rest or one-vs-one.
Two learning algorithms, i.e., one top-down algorithm and one bottom-up algorithm, have been implemented in the mPCVM.
The superior performance of the mPCVMs is extensively evaluated on synthetic and benchmark data sets.
arXiv Detail & Related papers (2020-06-29T03:21:38Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.