Modeling Label Correlations for Second-Order Semantic Dependency Parsing
with Mean-Field Inference
- URL: http://arxiv.org/abs/2204.03619v1
- Date: Thu, 7 Apr 2022 17:40:08 GMT
- Title: Modeling Label Correlations for Second-Order Semantic Dependency Parsing
with Mean-Field Inference
- Authors: Songlin Yang, Kewei Tu
- Abstract summary: Second-order semantic parsing with end-to-end mean-field inference has been shown good performance.
In this work we aim to improve this method by modeling label correlations between adjacent arcs.
To tackle this computational challenge, we leverage tensor decomposition techniques.
- Score: 34.75002236767817
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Second-order semantic parsing with end-to-end mean-field inference has been
shown good performance. In this work we aim to improve this method by modeling
label correlations between adjacent arcs. However, direct modeling leads to
memory explosion because second-order score tensors have sizes of $O(n^3L^2)$
($n$ is the sentence length and $L$ is the number of labels), which is not
affordable. To tackle this computational challenge, we leverage tensor
decomposition techniques, and interestingly, we show that the large
second-order score tensors have no need to be materialized during mean-field
inference, thereby reducing the computational complexity from cubic to
quadratic. We conduct experiments on SemEval 2015 Task 18 English datasets,
showing the effectiveness of modeling label correlations. Our code is publicly
available at https://github.com/sustcsonglin/mean-field-dep-parsing.
Related papers
- A Statistical Analysis of Deep Federated Learning for Intrinsically Low-dimensional Data [32.98264375121064]
Federated Learning (FL) has emerged as a groundbreaking paradigm in collaborative machine learning.
This paper investigates the generalization properties of deep federated regression within a two-stage sampling model.
arXiv Detail & Related papers (2024-10-28T01:36:25Z) - Joint Entity and Relation Extraction with Span Pruning and Hypergraph
Neural Networks [58.43972540643903]
We propose HyperGraph neural network for ERE ($hgnn$), which is built upon the PL-marker (a state-of-the-art marker-based pipleline model)
To alleviate error propagation,we use a high-recall pruner mechanism to transfer the burden of entity identification and labeling from the NER module to the joint module of our model.
Experiments on three widely used benchmarks for ERE task show significant improvements over the previous state-of-the-art PL-marker.
arXiv Detail & Related papers (2023-10-26T08:36:39Z) - Multi-Dictionary Tensor Decomposition [5.733331864416094]
We propose a framework for Multi-Dictionary Decomposition (MDTD)
We derive a general optimization algorithm for MDTD that handles both complete input and input with missing values.
It can impute missing values in billion-entry tensors more accurately and scalably than state-of-the-art competitors.
arXiv Detail & Related papers (2023-09-18T12:31:56Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Uncertainty Quantification of MLE for Entity Ranking with Covariates [3.2839905453386162]
This paper concerns with statistical estimation and inference for the ranking problems based on pairwise comparisons.
We propose a novel model, Co-Assisted Ranking Estimation (CARE) model, that extends the well-known Bradley-Terry-Luce (BTL) model.
We derive the maximum likelihood estimator of $alpha_i*_i=1n$ and $beta*$ under a sparse comparison graph.
We validate our theoretical results through large-scale numerical studies and an application to the mutual fund stock holding dataset.
arXiv Detail & Related papers (2022-12-20T02:28:27Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - FRAPPE: $\underline{\text{F}}$ast $\underline{\text{Ra}}$nk $\underline{\text{App}}$roximation with $\underline{\text{E}}$xplainable Features for Tensors [5.39764619690516]
FRAPPE is the first method to estimate the canonical rank of a tensor without having to compute the CPD.
It is over 24 times faster than the best-performing baseline and exhibits a 10% improvement in MAPE on a synthetic dataset.
arXiv Detail & Related papers (2022-06-19T03:19:59Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data
via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only.
The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.