LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing
- URL: http://arxiv.org/abs/2204.12820v1
- Date: Wed, 27 Apr 2022 10:21:28 GMT
- Title: LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing
- Authors: Iago Alonso-Alonso, David Vilares and Carlos G\'omez-Rodr\'iguez
- Abstract summary: This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency.
For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages.
For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data.
In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
- Score: 10.355938901584567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addressed the problem of structured sentiment analysis using a
bi-affine semantic dependency parser, large pre-trained language models, and
publicly available translation models. For the monolingual setup, we
considered: (i) training on a single treebank, and (ii) relaxing the setup by
training on treebanks coming from different languages that can be adequately
processed by cross-lingual language models. For the zero-shot setup and a given
target treebank, we relied on: (i) a word-level translation of available
treebanks in other languages to get noisy, unlikely-grammatical, but annotated
data (we release as much of it as licenses allow), and (ii) merging those
translated treebanks to obtain training data. In the post-evaluation phase, we
also trained cross-lingual models that simply merged all the English treebanks
and did not use word-level translations, and yet obtained better results.
According to the official results, we ranked 8th and 9th in the monolingual and
cross-lingual setups.
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - How Robust is Neural Machine Translation to Language Imbalance in
Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus.
We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z) - Multilingual Syntax-aware Language Modeling through Dependency Tree
Conversion [12.758523394180695]
We study the effect on neural language models (LMs) performance across nine conversion methods and five languages.
On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages.
Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.
arXiv Detail & Related papers (2022-04-19T03:56:28Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Examining Cross-lingual Contextual Embeddings with Orthogonal Structural
Probes [0.2538209532048867]
A novel Orthogonal Structural Probe (Limisiewicz and Marevcek, 2021) allows us to answer this question for specific linguistic features.
We evaluate syntactic (UD) and lexical (WordNet) structural information encoded inmBERT's contextual representations for nine diverse languages.
We successfully apply our findings to zero-shot and few-shot cross-lingual parsing.
arXiv Detail & Related papers (2021-09-10T15:03:11Z) - Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with
Synthetic Data [2.225882303328135]
We propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parsing task.
Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems.
arXiv Detail & Related papers (2021-09-09T14:51:11Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Cross-Lingual Adaptation Using Universal Dependencies [1.027974860479791]
We show that models trained using UD parse trees for complex NLP tasks can characterize very different languages.
Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages.
arXiv Detail & Related papers (2020-03-24T13:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.