Related papers: Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

URL: http://arxiv.org/abs/2009.07411v1
Date: Wed, 16 Sep 2020 01:30:21 GMT
Title: Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP
Authors: Hao Fei and Yafeng Ren and Donghong Ji
Abstract summary: In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.
Score: 34.74181162627023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

Related papers

Differentiable Tree Operations Promote Compositional Generalization [106.59434079287661]
Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations. DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
arXiv Detail & Related papers (2023-06-01T14:46:34Z)
Parallel Tree Kernel Computation [0.0]
We devise a parallel implementation of the sequential algorithm for the computation of some tree kernels of two finite sets of trees. Results show that the proposed parallel algorithm outperforms the sequential one in terms of latency.
arXiv Detail & Related papers (2023-05-12T18:16:45Z)
Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures. We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees. Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z)
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information. Experimental results demonstrate the effectiveness of our proposed approach. For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z)
EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [49.27021844132522]
State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN) We propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
arXiv Detail & Related papers (2020-08-24T14:48:23Z)
Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs [26.983602540576275]
Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks. We evaluate which of these two representational schemes more effectively introduces biases for syntactic structure. We show that a constituency-based network generalizes more robustly than a dependency-based one, and that combining the two types of structure does not yield further improvement.
arXiv Detail & Related papers (2020-04-30T18:00:06Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.