SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions
for Collocations in Spanish
- URL: http://arxiv.org/abs/2311.04189v1
- Date: Tue, 7 Nov 2023 18:32:34 GMT
- Title: SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions
for Collocations in Spanish
- Authors: Yevhen Kostiuk, Grigori Sidorov, Olga Kolesnikova
- Abstract summary: We present a dataset of most frequent Spanish verb-noun collocations and sentences where they occur.
Each collocation is assigned to one of 37 lexical functions defined as classes for a hierarchical classification task.
We combine the classes in a tree-based structure, and introduce classification objectives for each level of the structure.
- Score: 6.9454683800956705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In natural language processing (NLP), lexical function is a concept to
unambiguously represent semantic and syntactic features of words and phrases in
text first crafted in the Meaning-Text Theory. Hierarchical classification of
lexical functions involves organizing these features into a tree-like hierarchy
of categories or labels. This is a challenging task as it requires a good
understanding of the context and the relationships among words and phrases in
text. It also needs large amounts of labeled data to train language models
effectively. In this paper, we present a dataset of most frequent Spanish
verb-noun collocations and sentences where they occur, each collocation is
assigned to one of 37 lexical functions defined as classes for a hierarchical
classification task. Each class represents a relation between the noun and the
verb in a collocation involving their semantic and syntactic features. We
combine the classes in a tree-based structure, and introduce classification
objectives for each level of the structure. The dataset was created by
dependency tree parsing and matching of the phrases in Spanish news. We provide
baselines and data splits for each objective.
Related papers
- Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z) - Bridging Natural Language Processing and Psycholinguistics:
computationally grounded semantic similarity datasets for Basque and Spanish [0.0]
We present a word similarity dataset based on two well-known Natural Language Processing resources; text corpora and knowledge bases.
The present dataset includes noun pairs' information in Basque and European Spanish, but further work intends to extend it to more languages.
arXiv Detail & Related papers (2023-04-19T12:47:51Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization [80.94424037751243]
In zero-shot multilingual extractive text summarization, a model is typically trained on English dataset and then applied on summarization datasets of other languages.
We propose NLS (Neural Label Search for Summarization), which jointly learns hierarchical weights for different sets of labels together with our summarization model.
We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations.
arXiv Detail & Related papers (2022-04-28T14:02:16Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Seeing Both the Forest and the Trees: Multi-head Attention for Joint
Classification on Different Compositional Levels [15.453888735879525]
In natural languages, words are used in association to construct sentences.
We design a deep neural network architecture that explicitly wires lower and higher linguistic components.
We show that our model, MHAL, learns to simultaneously solve them at different levels of granularity.
arXiv Detail & Related papers (2020-11-01T10:44:46Z) - Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards [13.753240692520098]
We present a neural network architecture for joint coreference resolution and semantic role labeling for English.
We use reinforcement learning to encourage global coherence over the document and between semantic annotations.
This leads to improvements on both tasks in multiple datasets from different domains.
arXiv Detail & Related papers (2020-10-12T09:36:24Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.