Related papers: HuAMR: A Hungarian AMR Parser and Dataset

HuAMR: A Hungarian AMR Parser and Dataset

URL: http://arxiv.org/abs/2502.20552v1
Date: Thu, 27 Feb 2025 21:48:11 GMT
Title: HuAMR: A Hungarian AMR Parser and Dataset
Authors: Botond Barta, Endre Hamerlik, Milán Konor Nyist, Judit Ács,
Abstract summary: We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMRs for Hungarian.<n>To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality.
Score: 0.20499240875881997
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMR parsers for Hungarian, targeting the scarcity of semantic resources for non-English languages. To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality. Building on this dataset, we investigate how different model architectures - mT5 Large and Llama-3.2-1B - and fine-tuning strategies affect AMR parsing performance. While incorporating silver-standard AMRs from Llama-3.1-70B into the training data of smaller models does not consistently boost overall scores, our results show that these techniques effectively enhance parsing accuracy on Hungarian news data (the domain of HuAMR). We evaluate our parsers using Smatch scores and confirm the potential of HuAMR and our parsers for advancing semantic parsing research.

Related papers

Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval [49.1574468325115]
We introduce Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones.<n>Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10.<n>More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M) remain competitive while being over 13x smaller.
arXiv Detail & Related papers (2025-05-25T23:06:20Z)
Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction [53.55440811942249]
Grammatical Error Correction (GEC) is the task of correcting errorful sentences into grammatically correct, semantically consistent, and coherent sentences. We propose the AMR-GEC, a seq-to-seq model that incorporates denoised AMR as additional knowledge.
arXiv Detail & Related papers (2023-07-05T09:06:56Z)
AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing [38.731641198934646]
We show how ensemble models can exploit SMATCH metric weaknesses to obtain higher scores, but sometimes result in corrupted graphs. We propose two novel ensemble strategies based on Transformer models, improving robustness to structural constraints, while also reducing computational time.
arXiv Detail & Related papers (2023-06-19T08:58:47Z)
An AMR-based Link Prediction Approach for Document-level Event Argument Extraction [51.77733454436013]
Recent works have introduced Abstract Meaning Representation (AMR) for Document-level Event Argument Extraction (Doc-level EAE) This work reformulates EAE as a link prediction problem on AMR graphs. We propose a novel graph structure, Tailored AMR Graph (TAG), which compresses less informative subgraphs and edge types, integrates span information, and highlights surrounding events in the same document.
arXiv Detail & Related papers (2023-05-30T16:07:48Z)
AMR Parsing with Instruction Fine-tuned Pre-trained Language Models [21.767812442354387]
In this paper, we take one of such instruction fine-tuned language models, i.e. FLAN-T5, and fine-tune them for AMR parsing. Our experiments on various AMR parsing tasks including AMR2.0, AMR3.0 and BioAMR indicate that FLAN-T5 fine-tuned models out-perform previous state-of-the-art models.
arXiv Detail & Related papers (2023-04-24T17:12:17Z)
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training. Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z)
Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation [55.117031558677674]
We study different techniques for automatically generating AMR annotations. Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR. Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.
arXiv Detail & Related papers (2021-09-08T17:55:46Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
Probabilistic, Structure-Aware Algorithms for Improved Variety, Accuracy, and Coverage of AMR Alignments [9.74672460306765]
We present algorithms for aligning components of Abstract Meaning Representation (AMR) spans in English sentences. We leverage unsupervised learning in combination with graphs, taking the best of both worlds from previous AMR. Our approach covers a wider variety of AMR substructures than previously considered, achieves higher coverage of nodes and edges, and does so with higher accuracy.
arXiv Detail & Related papers (2021-06-10T18:46:32Z)
Improving AMR Parsing with Sequence-to-Sequence Pre-training [39.33133978535497]
In this paper, we focus on sequence-to-sequence (seq2seq) AMR parsing. We propose a seq2seq pre-training approach to build pre-trained models in both single and joint way. Experiments show that both the single and joint pre-trained models significantly improve the performance.
arXiv Detail & Related papers (2020-10-05T04:32:47Z)
Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models. We evaluate our models on domain adaptation, low-resource, and high-resource MT settings. Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.