Comparison by Conversion: Reverse-Engineering UCCA from Syntax and
Lexical Semantics
- URL: http://arxiv.org/abs/2011.00834v1
- Date: Mon, 2 Nov 2020 09:03:46 GMT
- Title: Comparison by Conversion: Reverse-Engineering UCCA from Syntax and
Lexical Semantics
- Authors: Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam
de Lhoneux and Omri Abend
- Abstract summary: Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other.
We evaluate the mapping between meaning representations from different frameworks using two complementary methods: (i) a rule-based converter, and (ii) a supervised delexicalized that parses to one framework using only information from the other as features.
- Score: 29.971739294416714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building robust natural language understanding systems will require a clear
characterization of whether and how various linguistic meaning representations
complement each other. To perform a systematic comparative analysis, we
evaluate the mapping between meaning representations from different frameworks
using two complementary methods: (i) a rule-based converter, and (ii) a
supervised delexicalized parser that parses to one framework using only
information from the other as features. We apply these methods to convert the
STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a
graph-structured full-sentence meaning representation). Both methods yield
surprisingly accurate target representations, close to fully supervised UCCA
parser quality---indicating that UCCA annotations are partially redundant with
STREUSLE annotations. Despite this substantial convergence between frameworks,
we find several important areas of divergence.
Related papers
- Cross-domain Chinese Sentence Pattern Parsing [67.1381983012038]
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.
Existing SPSs rely heavily on textbook corpora for training, lacking cross-domain capability.
This paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.
arXiv Detail & Related papers (2024-02-26T05:30:48Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Self-Attentive Constituency Parsing for UCCA-based Semantic Parsing [0.0]
Graph-based representation is one of the semantic representation approaches to express the semantic structure of a text.
In this paper, we focus primarily on UCCA graph-based semantic representation.
We present the results for both single-lingual and cross-lingual tasks using zero-shot and few-shot learning for low-resource languages.
arXiv Detail & Related papers (2021-10-01T19:10:18Z) - Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech [27.657676278734534]
This paper proposes a methodology for constructing such corpora of child directed speech paired with sentential logical forms.
The approach enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing.
arXiv Detail & Related papers (2021-09-22T18:17:06Z) - Character-level Representations Improve DRS-based Semantic Parsing Even
in the Age of BERT [6.705577865528099]
We combine character-level and contextual language model representations to improve performance on parsing.
For English, these improvements are larger than adding individual sources of linguistic information.
A new method of analysis based on semantic tags demonstrates that the character-level representations improve performance across a subset of selected semantic phenomena.
arXiv Detail & Related papers (2020-11-09T10:24:12Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Refining Implicit Argument Annotation for UCCA [6.873471412788333]
This paper proposes a typology for fine-grained implicit argument annotation on top of Universal Cognitive Conceptual's foundational layer.
The proposed implicit argument categorisation is driven by theories of implicit role interpretation and consists of six types: Deictic, Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set.
arXiv Detail & Related papers (2020-05-26T17:24:15Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.