Semantic Role Labeling of NomBank Partitives
- URL: http://arxiv.org/abs/2412.14328v2
- Date: Fri, 20 Dec 2024 16:17:20 GMT
- Title: Semantic Role Labeling of NomBank Partitives
- Authors: Adam Meyers, Advait Pravin Savant, John E. Ortega,
- Abstract summary: Several systems are described using traditional and transformer-based machine learning, as well as ensembling.<n>Our highest scoring system achieves an F1 of 91.74% using "gold" parses from the Penn Treebank and 91.12% when using the Berkeley Neural.
- Score: 2.867517731896504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article is about Semantic Role Labeling for English partitive nouns (5%/REL of the price/ARG1; The price/ARG1 rose 5 percent/REL) in the NomBank annotated corpus. Several systems are described using traditional and transformer-based machine learning, as well as ensembling. Our highest scoring system achieves an F1 of 91.74% using "gold" parses from the Penn Treebank and 91.12% when using the Berkeley Neural parser. This research includes both classroom and experimental settings for system development.
Related papers
- Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia [55.23627698804683]
We study the scaling behavior of different numeral systems in the context of transformer-based large language models.
A base $10$ system is consistently more data-efficient than a base $102$ or $103$ system across training data scale.
We identify that base $100$ and base $1000$ systems struggle on token-level discernment and token-level operations.
arXiv Detail & Related papers (2024-09-25T22:08:31Z) - Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts [65.10991154918737]
This study focuses on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China.
Our tokenizer first adopts character detection to locate character boundaries, and then conducts character recognition at both the character and sub-character levels.
To support the academic community, we have also assembled the first large-scale dataset of CBSs with over 100K annotated character image scans.
arXiv Detail & Related papers (2024-09-02T07:42:55Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - BASPRO: a balanced script producer for speech corpus collection based on
the genetic algorithm [29.701197643765674]
The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation.
We propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences.
arXiv Detail & Related papers (2022-12-11T02:05:30Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing
Results and Analysis [2.8749014299466444]
We present the first parsing results on the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1.9 million word treebank.
We describe key features of PPCEME that make it challenging for parsing, including a larger and more varied set of function tags than in the Penn Treebank.
arXiv Detail & Related papers (2021-12-15T23:56:21Z) - Neural Text Classification and Stacked Heterogeneous Embeddings for
Named Entity Recognition in SMM4H 2021 [1.195496689595016]
We addressed Named Entity Recognition (NER) and Text Classification.
To address NER we explored BiLSTM-CRF with Stacked Heterogeneous Embeddings and linguistic features.
Our proposed approaches can be generalized to different languages and we have shown its effectiveness for English and Spanish.
arXiv Detail & Related papers (2021-06-10T15:43:21Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z) - Improving Neural Named Entity Recognition with Gazetteers [6.292153194561472]
This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system.
Experiments reveal that the approach yields performance gains in two distinct languages.
arXiv Detail & Related papers (2020-03-06T08:29:37Z) - Automatic Compilation of Resources for Academic Writing and Evaluating
with Informal Word Identification and Paraphrasing System [24.42822218256954]
We present the first approach to automatically building resources for academic writing.
The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing.
arXiv Detail & Related papers (2020-03-05T22:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.